Package org.apache.tika.embedder
Class ExternalEmbedder
java.lang.Object
org.apache.tika.embedder.ExternalEmbedder
- All Implemented Interfaces:
- Serializable,- Embedder
Embedder that uses an external program (like sed or exiftool) to embed text
 content and metadata into a given document.
- Since:
- Apache Tika 1.3
- See Also:
- 
Field SummaryFields
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionstatic booleanChecks to see if the command can be run.static booleanChecks to see if the command can be run.voidembed(Metadata metadata, InputStream inputStream, OutputStream outputStream, ParseContext context) Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler.String[]Gets the command to be run.Gets the operator to append rather than replace a value for the command line tool, i.e. "+=".Gets the delimiter for multiple assignments for the command line tool, i.e. ", ".Gets the assignment operator for the command line tool, i.e. "=".getCommandMetadataSegments(Metadata metadata) Constructs a collection of command line arguments responsible for setting individual metadata fields based on the givenmetadata.Gets the map of Metadata keys to command line parameters.getSupportedEmbedTypes(ParseContext context) Returns the set of media types supported by this embedder when used with the given parse context.booleanGets whether or not to quote assignment values, i.e. tag='value'.protected static StringserializeMetadata(List<String> metadataCommandArguments) Serializes a collection of metadata command line arguments into a single string.voidsetCommand(String... command) Sets the command to be run.voidsetCommandAppendOperator(String commandAppendOperator) Sets the operator to append rather than replace a value for the command line tool, i.e. "+=".voidsetCommandAssignmentDelimeter(String commandAssignmentDelimeter) Sets the delimiter for multiple assignments for the command line tool, i.e. ", ".voidsetCommandAssignmentOperator(String commandAssignmentOperator) Sets the assignment operator for the command line tool, i.e. "=".voidsetMetadataCommandArguments(Map<Property, String[]> arguments) Sets the map of Metadata keys to command line parameters.voidsetQuoteAssignmentValues(boolean quoteAssignmentValues) Sets whether or not to quote assignment values, i.e. tag='value'.voidsetSupportedEmbedTypes(Set<MediaType> supportedEmbedTypes) 
- 
Field Details- 
METADATA_COMMAND_ARGUMENTS_TOKENToken to be replaced with a String array of metadata assignment command arguments- See Also:
 
- 
METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKENToken to be replaced with a String array of metadata assignment command arguments- See Also:
 
 
- 
- 
Constructor Details- 
ExternalEmbedderpublic ExternalEmbedder()
 
- 
- 
Method Details- 
serializeMetadataSerializes a collection of metadata command line arguments into a single string.- Parameters:
- metadataCommandArguments-
- Returns:
- the serialized metadata arguments string
 
- 
checkChecks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.- Parameters:
- checkCmd- the check command to run
- errorValue- what is considered an error value?
- Returns:
- whether or not the check completed without error
 
- 
checkChecks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.- Parameters:
- checkCmd- the check command to run
- errorValue- what is considered an error value?
- Returns:
- whether or not the check completed without error
 
- 
getSupportedEmbedTypesDescription copied from interface:EmbedderReturns the set of media types supported by this embedder when used with the given parse context.The name differs from the precedence of Parser.getSupportedTypes(ParseContext)so that parser implementations may also choose to implement this interface.- Specified by:
- getSupportedEmbedTypesin interface- Embedder
- Parameters:
- context- parse context
- Returns:
- immutable set of media types
 
- 
getSupportedEmbedTypes
- 
setSupportedEmbedTypes
- 
getCommandGets the command to be run. This can include either ofExternalParser.INPUT_FILE_TOKENorExternalParser.OUTPUT_FILE_TOKENif the command needs filenames.- Returns:
 
- 
setCommandSets the command to be run. This can include either ofExternalParser.INPUT_FILE_TOKENorExternalParser.OUTPUT_FILE_TOKENif the command needs filenames.- See Also:
 
- 
getCommandAssignmentOperatorGets the assignment operator for the command line tool, i.e. "=".- Returns:
- the assignment operator
 
- 
setCommandAssignmentOperatorSets the assignment operator for the command line tool, i.e. "=".- Parameters:
- commandAssignmentOperator-
 
- 
getCommandAssignmentDelimeterGets the delimiter for multiple assignments for the command line tool, i.e. ", ".- Returns:
- the assignment delimiter
 
- 
setCommandAssignmentDelimeterSets the delimiter for multiple assignments for the command line tool, i.e. ", ".- Parameters:
- commandAssignmentDelimeter-
 
- 
getCommandAppendOperatorGets the operator to append rather than replace a value for the command line tool, i.e. "+=".- Returns:
- the append operator
 
- 
setCommandAppendOperatorSets the operator to append rather than replace a value for the command line tool, i.e. "+=".- Parameters:
- commandAppendOperator-
 
- 
isQuoteAssignmentValuespublic boolean isQuoteAssignmentValues()Gets whether or not to quote assignment values, i.e. tag='value'. The default is false.- Returns:
- whether or not to quote assignment values
 
- 
setQuoteAssignmentValuespublic void setQuoteAssignmentValues(boolean quoteAssignmentValues) Sets whether or not to quote assignment values, i.e. tag='value'.- Parameters:
- quoteAssignmentValues-
 
- 
getMetadataCommandArgumentsGets the map of Metadata keys to command line parameters.- Returns:
- the metadata to CLI param map
 
- 
setMetadataCommandArgumentsSets the map of Metadata keys to command line parameters. Set this to null to disable Metadata embedding.- Parameters:
- arguments-
 
- 
getCommandMetadataSegmentsConstructs a collection of command line arguments responsible for setting individual metadata fields based on the givenmetadata.- Parameters:
- metadata- the metadata to embed
- Returns:
- the metadata-related command line arguments
 
- 
embedpublic void embed(Metadata metadata, InputStream inputStream, OutputStream outputStream, ParseContext context) throws IOException, TikaException Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted ifsetMetadataCommandArguments(Map)has been called to set arguments.- Specified by:
- embedin interface- Embedder
- Parameters:
- metadata- document metadata (input and output)
- inputStream- the document stream (input)
- outputStream- the output stream to write the metadata embedded data to
- context- parse context
- Throws:
- IOException- if the document stream could not be read
- TikaException- if the document could not be parsed
 
 
-