Class SolrEmitter

java.lang.Object
org.apache.tika.pipes.emitter.AbstractEmitter
org.apache.tika.pipes.emitter.solr.SolrEmitter
All Implemented Interfaces:
Initializable, Emitter

public class SolrEmitter extends AbstractEmitter implements Initializable
  • Field Details

    • DEFAULT_EMBEDDED_FILE_FIELD_NAME

      public static String DEFAULT_EMBEDDED_FILE_FIELD_NAME
  • Constructor Details

  • Method Details

    • emit

      public void emit(String emitKey, List<Metadata> metadataList, ParseContext parseContext) throws IOException, TikaEmitterException
      Specified by:
      emit in interface Emitter
      Throws:
      IOException
      TikaEmitterException
    • emit

      public void emit(List<? extends EmitData> batch) throws IOException, TikaEmitterException
      Description copied from class: AbstractEmitter
      The default behavior is to call Emitter.emit(String, List, ParseContext) on each item. Some implementations, e.g. Solr/ES/vespa, can benefit from subclassing this and emitting a bunch of docs at once.
      Specified by:
      emit in interface Emitter
      Overrides:
      emit in class AbstractEmitter
      Throws:
      IOException
      TikaEmitterException
    • setAttachmentStrategy

      @Field public void setAttachmentStrategy(String attachmentStrategy)
      Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD. Default is "PARENT_CHILD". If set to "SKIP", this will index only the main file and ignore all info in the attachments. If set to "CONCATENATE_CONTENT", this will concatenate the content extracted from the attachments into the main document and then index the main document with the concatenated content _and_ the main document's metadata (metadata from attachments will be thrown away). If set to "PARENT_CHILD", this will index the attachments as children of the parent document via Solr's parent-child relationship.
    • setUpdateStrategy

      @Field public void setUpdateStrategy(String updateStrategy)
    • setConnectionTimeout

      @Field public void setConnectionTimeout(int connectionTimeout)
    • setSocketTimeout

      @Field public void setSocketTimeout(int socketTimeout)
    • getCommitWithin

      public int getCommitWithin()
    • setCommitWithin

      @Field public void setCommitWithin(int commitWithin)
    • setIdField

      @Field public void setIdField(String idField)
      Specify the field in the first Metadata that should be used as the id field for the document.
      Parameters:
      idField -
    • setSolrCollection

      @Field public void setSolrCollection(String solrCollection)
    • setSolrUrls

      @Field public void setSolrUrls(List<String> solrUrls)
    • setSolrZkHosts

      @Field public void setSolrZkHosts(List<String> solrZkHosts)
    • setSolrZkChroot

      @Field public void setSolrZkChroot(String solrZkChroot)
    • setUserName

      @Field public void setUserName(String userName)
    • setPassword

      @Field public void setPassword(String password)
    • setAuthScheme

      @Field public void setAuthScheme(String authScheme)
    • setProxyHost

      @Field public void setProxyHost(String proxyHost)
    • setProxyPort

      @Field public void setProxyPort(int proxyPort)
    • setEmbeddedFileFieldName

      @Field public void setEmbeddedFileFieldName(String embeddedFileFieldName)
      If using the SolrEmitter.AttachmentStrategy.PARENT_CHILD, this is the field name used to store the child documents. Note that we artificially flatten all embedded documents, no matter how nested in the container document, into direct children of the root document.
      Parameters:
      embeddedFileFieldName -
    • initialize

      public void initialize(Map<String,Param> params) throws TikaConfigException
      Specified by:
      initialize in interface Initializable
      Parameters:
      params - params to use for initialization
      Throws:
      TikaConfigException
    • checkInitialization

      public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
      Specified by:
      checkInitialization in interface Initializable
      Parameters:
      problemHandler - if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.
      Throws:
      TikaConfigException