Package org.apache.tika.embedder
Class ExternalEmbedder
- java.lang.Object
-
- org.apache.tika.embedder.ExternalEmbedder
-
- All Implemented Interfaces:
Serializable
,Embedder
public class ExternalEmbedder extends Object implements Embedder
Embedder that uses an external program (like sed or exiftool) to embed text content and metadata into a given document.- Since:
- Apache Tika 1.3
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
Token to be replaced with a String array of metadata assignment command argumentsstatic String
METADATA_COMMAND_ARGUMENTS_TOKEN
Token to be replaced with a String array of metadata assignment command arguments
-
Constructor Summary
Constructors Constructor Description ExternalEmbedder()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static boolean
check(String[] checkCmd, int... errorValue)
Checks to see if the command can be run.static boolean
check(String checkCmd, int... errorValue)
Checks to see if the command can be run.void
embed(Metadata metadata, InputStream inputStream, OutputStream outputStream, ParseContext context)
Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler.String[]
getCommand()
Gets the command to be run.String
getCommandAppendOperator()
Gets the operator to append rather than replace a value for the command line tool, i.e.String
getCommandAssignmentDelimeter()
Gets the delimiter for multiple assignments for the command line tool, i.e.String
getCommandAssignmentOperator()
Gets the assignment operator for the command line tool, i.e.protected List<String>
getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting individual metadata fields based on the givenmetadata
.Map<Property,String[]>
getMetadataCommandArguments()
Gets the map of Metadata keys to command line parameters.Set<MediaType>
getSupportedEmbedTypes()
Set<MediaType>
getSupportedEmbedTypes(ParseContext context)
Returns the set of media types supported by this embedder when used with the given parse context.boolean
isQuoteAssignmentValues()
Gets whether or not to quote assignment values, i.e.protected static String
serializeMetadata(List<String> metadataCommandArguments)
Serializes a collection of metadata command line arguments into a single string.void
setCommand(String... command)
Sets the command to be run.void
setCommandAppendOperator(String commandAppendOperator)
Sets the operator to append rather than replace a value for the command line tool, i.e.void
setCommandAssignmentDelimeter(String commandAssignmentDelimeter)
Sets the delimiter for multiple assignments for the command line tool, i.e.void
setCommandAssignmentOperator(String commandAssignmentOperator)
Sets the assignment operator for the command line tool, i.e.void
setMetadataCommandArguments(Map<Property,String[]> arguments)
Sets the map of Metadata keys to command line parameters.void
setQuoteAssignmentValues(boolean quoteAssignmentValues)
Sets whether or not to quote assignment values, i.e.void
setSupportedEmbedTypes(Set<MediaType> supportedEmbedTypes)
-
-
-
Field Detail
-
METADATA_COMMAND_ARGUMENTS_TOKEN
public static final String METADATA_COMMAND_ARGUMENTS_TOKEN
Token to be replaced with a String array of metadata assignment command arguments- See Also:
- Constant Field Values
-
METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
public static final String METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
Token to be replaced with a String array of metadata assignment command arguments- See Also:
- Constant Field Values
-
-
Method Detail
-
serializeMetadata
protected static String serializeMetadata(List<String> metadataCommandArguments)
Serializes a collection of metadata command line arguments into a single string.- Parameters:
metadataCommandArguments
-- Returns:
- the serialized metadata arguments string
-
check
public static boolean check(String checkCmd, int... errorValue)
Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.- Parameters:
checkCmd
- the check command to runerrorValue
- what is considered an error value?- Returns:
- whether or not the check completed without error
-
check
public static boolean check(String[] checkCmd, int... errorValue)
Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.- Parameters:
checkCmd
- the check command to runerrorValue
- what is considered an error value?- Returns:
- whether or not the check completed without error
-
getSupportedEmbedTypes
public Set<MediaType> getSupportedEmbedTypes(ParseContext context)
Description copied from interface:Embedder
Returns the set of media types supported by this embedder when used with the given parse context.The name differs from the precedence of
Parser.getSupportedTypes(ParseContext)
so that parser implementations may also choose to implement this interface.- Specified by:
getSupportedEmbedTypes
in interfaceEmbedder
- Parameters:
context
- parse context- Returns:
- immutable set of media types
-
getCommand
public String[] getCommand()
Gets the command to be run. This can include either ofExternalParser.INPUT_FILE_TOKEN
orExternalParser.OUTPUT_FILE_TOKEN
if the command needs filenames.- Returns:
-
setCommand
public void setCommand(String... command)
Sets the command to be run. This can include either ofExternalParser.INPUT_FILE_TOKEN
orExternalParser.OUTPUT_FILE_TOKEN
if the command needs filenames.- See Also:
Runtime.exec(String[])
-
getCommandAssignmentOperator
public String getCommandAssignmentOperator()
Gets the assignment operator for the command line tool, i.e. "=".- Returns:
- the assignment operator
-
setCommandAssignmentOperator
public void setCommandAssignmentOperator(String commandAssignmentOperator)
Sets the assignment operator for the command line tool, i.e. "=".- Parameters:
commandAssignmentOperator
-
-
getCommandAssignmentDelimeter
public String getCommandAssignmentDelimeter()
Gets the delimiter for multiple assignments for the command line tool, i.e. ", ".- Returns:
- the assignment delimiter
-
setCommandAssignmentDelimeter
public void setCommandAssignmentDelimeter(String commandAssignmentDelimeter)
Sets the delimiter for multiple assignments for the command line tool, i.e. ", ".- Parameters:
commandAssignmentDelimeter
-
-
getCommandAppendOperator
public String getCommandAppendOperator()
Gets the operator to append rather than replace a value for the command line tool, i.e. "+=".- Returns:
- the append operator
-
setCommandAppendOperator
public void setCommandAppendOperator(String commandAppendOperator)
Sets the operator to append rather than replace a value for the command line tool, i.e. "+=".- Parameters:
commandAppendOperator
-
-
isQuoteAssignmentValues
public boolean isQuoteAssignmentValues()
Gets whether or not to quote assignment values, i.e. tag='value'. The default is false.- Returns:
- whether or not to quote assignment values
-
setQuoteAssignmentValues
public void setQuoteAssignmentValues(boolean quoteAssignmentValues)
Sets whether or not to quote assignment values, i.e. tag='value'.- Parameters:
quoteAssignmentValues
-
-
getMetadataCommandArguments
public Map<Property,String[]> getMetadataCommandArguments()
Gets the map of Metadata keys to command line parameters.- Returns:
- the metadata to CLI param map
-
setMetadataCommandArguments
public void setMetadataCommandArguments(Map<Property,String[]> arguments)
Sets the map of Metadata keys to command line parameters. Set this to null to disable Metadata embedding.- Parameters:
arguments
-
-
getCommandMetadataSegments
protected List<String> getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting individual metadata fields based on the givenmetadata
.- Parameters:
metadata
- the metadata to embed- Returns:
- the metadata-related command line arguments
-
embed
public void embed(Metadata metadata, InputStream inputStream, OutputStream outputStream, ParseContext context) throws IOException, TikaException
Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted ifsetMetadataCommandArguments(Map)
has been called to set arguments.- Specified by:
embed
in interfaceEmbedder
- Parameters:
metadata
- document metadata (input and output)inputStream
- the document stream (input)outputStream
- the output stream to write the metadata embedded data tocontext
- parse context- Throws:
IOException
- if the document stream could not be readTikaException
- if the document could not be parsed
-
-