Class SolrEmitter
- java.lang.Object
-
- org.apache.tika.pipes.emitter.AbstractEmitter
-
- org.apache.tika.pipes.emitter.solr.SolrEmitter
-
- All Implemented Interfaces:
Initializable
,Emitter
public class SolrEmitter extends AbstractEmitter implements Initializable
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SolrEmitter.AttachmentStrategy
static class
SolrEmitter.UpdateStrategy
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_EMBEDDED_FILE_FIELD_NAME
-
Constructor Summary
Constructors Constructor Description SolrEmitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkInitialization(InitializableProblemHandler problemHandler)
void
emit(String emitKey, List<Metadata> metadataList)
void
emit(List<? extends EmitData> batch)
The default behavior is to callEmitter.emit(String, List)
on each item.int
getCommitWithin()
void
initialize(Map<String,Param> params)
void
setAttachmentStrategy(String attachmentStrategy)
Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD.void
setAuthScheme(String authScheme)
void
setCommitWithin(int commitWithin)
void
setConnectionTimeout(int connectionTimeout)
void
setEmbeddedFileFieldName(String embeddedFileFieldName)
If using theSolrEmitter.AttachmentStrategy.PARENT_CHILD
, this is the field name used to store the child documents.void
setIdField(String idField)
Specify the field in the first Metadata that should be used as the id field for the document.void
setPassword(String password)
void
setProxyHost(String proxyHost)
void
setProxyPort(int proxyPort)
void
setSocketTimeout(int socketTimeout)
void
setSolrCollection(String solrCollection)
void
setSolrUrls(List<String> solrUrls)
void
setSolrZkChroot(String solrZkChroot)
void
setSolrZkHosts(List<String> solrZkHosts)
void
setUpdateStrategy(String updateStrategy)
void
setUserName(String userName)
-
Methods inherited from class org.apache.tika.pipes.emitter.AbstractEmitter
getName, setName
-
-
-
-
Field Detail
-
DEFAULT_EMBEDDED_FILE_FIELD_NAME
public static String DEFAULT_EMBEDDED_FILE_FIELD_NAME
-
-
Constructor Detail
-
SolrEmitter
public SolrEmitter() throws TikaConfigException
- Throws:
TikaConfigException
-
-
Method Detail
-
emit
public void emit(String emitKey, List<Metadata> metadataList) throws IOException, TikaEmitterException
- Specified by:
emit
in interfaceEmitter
- Throws:
IOException
TikaEmitterException
-
emit
public void emit(List<? extends EmitData> batch) throws IOException, TikaEmitterException
Description copied from class:AbstractEmitter
The default behavior is to callEmitter.emit(String, List)
on each item. Some implementations, e.g. Solr/ES/vespa, can benefit from subclassing this and emitting a bunch of docs at once.- Specified by:
emit
in interfaceEmitter
- Overrides:
emit
in classAbstractEmitter
- Throws:
IOException
TikaEmitterException
-
setAttachmentStrategy
@Field public void setAttachmentStrategy(String attachmentStrategy)
Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD. Default is "PARENT_CHILD". If set to "SKIP", this will index only the main file and ignore all info in the attachments. If set to "CONCATENATE_CONTENT", this will concatenate the content extracted from the attachments into the main document and then index the main document with the concatenated content _and_ the main document's metadata (metadata from attachments will be thrown away). If set to "PARENT_CHILD", this will index the attachments as children of the parent document via Solr's parent-child relationship.
-
setConnectionTimeout
@Field public void setConnectionTimeout(int connectionTimeout)
-
setSocketTimeout
@Field public void setSocketTimeout(int socketTimeout)
-
getCommitWithin
public int getCommitWithin()
-
setCommitWithin
@Field public void setCommitWithin(int commitWithin)
-
setIdField
@Field public void setIdField(String idField)
Specify the field in the first Metadata that should be used as the id field for the document.- Parameters:
idField
-
-
setProxyPort
@Field public void setProxyPort(int proxyPort)
-
setEmbeddedFileFieldName
@Field public void setEmbeddedFileFieldName(String embeddedFileFieldName)
If using theSolrEmitter.AttachmentStrategy.PARENT_CHILD
, this is the field name used to store the child documents. Note that we artificially flatten all embedded documents, no matter how nested in the container document, into direct children of the root document.- Parameters:
embeddedFileFieldName
-
-
initialize
public void initialize(Map<String,Param> params) throws TikaConfigException
- Specified by:
initialize
in interfaceInitializable
- Parameters:
params
- params to use for initialization- Throws:
TikaConfigException
-
checkInitialization
public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
- Specified by:
checkInitialization
in interfaceInitializable
- Parameters:
problemHandler
- if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.- Throws:
TikaConfigException
-
-