|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.AbstractParser org.apache.tika.parser.external.ExternalParser
public class ExternalParser
Parser that uses an external program (like catdoc or pdf2txt) to extract text content and metadata from a given document.
Field Summary | |
---|---|
static java.lang.String |
INPUT_FILE_TOKEN
The token, which if present in the Command string, will be replaced with the input filename. |
static java.lang.String |
OUTPUT_FILE_TOKEN
The token, which if present in the Command string, will be replaced with the output filename. |
Constructor Summary | |
---|---|
ExternalParser()
|
Method Summary | |
---|---|
static boolean |
check(java.lang.String[] checkCmd,
int... errorValue)
|
static boolean |
check(java.lang.String checkCmd,
int... errorValue)
Checks to see if the command can be run. |
java.lang.String[] |
getCommand()
|
java.util.Map<java.util.regex.Pattern,java.lang.String> |
getMetadataExtractionPatterns()
|
java.util.Set<MediaType> |
getSupportedTypes()
|
java.util.Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the given parse context. |
void |
parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. |
void |
setCommand(java.lang.String... command)
Sets the command to be run. |
void |
setMetadataExtractionPatterns(java.util.Map<java.util.regex.Pattern,java.lang.String> patterns)
Sets the map of regular expression patterns and Metadata keys. |
void |
setSupportedTypes(java.util.Set<MediaType> supportedTypes)
|
Methods inherited from class org.apache.tika.parser.AbstractParser |
---|
parse |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String INPUT_FILE_TOKEN
public static final java.lang.String OUTPUT_FILE_TOKEN
Constructor Detail |
---|
public ExternalParser()
Method Detail |
---|
public java.util.Set<MediaType> getSupportedTypes(ParseContext context)
Parser
context
- parse context
public java.util.Set<MediaType> getSupportedTypes()
public void setSupportedTypes(java.util.Set<MediaType> supportedTypes)
public java.lang.String[] getCommand()
public void setCommand(java.lang.String... command)
INPUT_FILE_TOKEN
or OUTPUT_FILE_TOKEN
if the command needs filenames.
Runtime.exec(String[])
public java.util.Map<java.util.regex.Pattern,java.lang.String> getMetadataExtractionPatterns()
public void setMetadataExtractionPatterns(java.util.Map<java.util.regex.Pattern,java.lang.String> patterns)
public void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context) throws java.io.IOException, org.xml.sax.SAXException, TikaException
setMetadataExtractionPatterns(Map)
has been called to set patterns.
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context
java.io.IOException
- if the document stream could not be read
org.xml.sax.SAXException
- if the SAX events could not be processed
TikaException
- if the document could not be parsedpublic static boolean check(java.lang.String checkCmd, int... errorValue)
checkCmd
- The check command to runerrorValue
- What is considered an error value?public static boolean check(java.lang.String[] checkCmd, int... errorValue)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |