public class ExternalParser extends AbstractParser
Modifier and Type | Class and Description |
---|---|
static interface |
ExternalParser.LineConsumer
Consumer contract
|
Modifier and Type | Field and Description |
---|---|
static String |
INPUT_FILE_TOKEN
The token, which if present in the Command string, will
be replaced with the input filename.
|
static String |
OUTPUT_FILE_TOKEN
The token, which if present in the Command string, will
be replaced with the output filename.
|
Constructor and Description |
---|
ExternalParser() |
Modifier and Type | Method and Description |
---|---|
static boolean |
check(String[] checkCmd,
int... errorValue) |
static boolean |
check(String checkCmd,
int... errorValue)
Checks to see if the command can be run.
|
String[] |
getCommand() |
ExternalParser.LineConsumer |
getIgnoredLineConsumer()
Gets lines consumer
|
Map<Pattern,String> |
getMetadataExtractionPatterns() |
Set<MediaType> |
getSupportedTypes() |
Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used
with the given parse context.
|
void |
parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
setCommand(String... command)
Sets the command to be run.
|
void |
setIgnoredLineConsumer(ExternalParser.LineConsumer ignoredLineConsumer)
Set a consumer for the lines ignored by the parse functions
|
void |
setMetadataExtractionPatterns(Map<Pattern,String> patterns)
Sets the map of regular expression patterns and Metadata
keys.
|
void |
setSupportedTypes(Set<MediaType> supportedTypes) |
parse
public static final String INPUT_FILE_TOKEN
public static final String OUTPUT_FILE_TOKEN
public static boolean check(String checkCmd, int... errorValue)
checkCmd
- The check command to runerrorValue
- What is considered an error value?public static boolean check(String[] checkCmd, int... errorValue)
public Set<MediaType> getSupportedTypes(ParseContext context)
Parser
context
- parse contextpublic String[] getCommand()
public void setCommand(String... command)
INPUT_FILE_TOKEN
or OUTPUT_FILE_TOKEN
if the command needs filenames.Runtime.exec(String[])
public ExternalParser.LineConsumer getIgnoredLineConsumer()
public void setIgnoredLineConsumer(ExternalParser.LineConsumer ignoredLineConsumer)
ignoredLineConsumer
- consumer instancepublic void setMetadataExtractionPatterns(Map<Pattern,String> patterns)
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
setMetadataExtractionPatterns(Map)
has been called to set patterns.stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse contextIOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsedCopyright © 2007–2023 The Apache Software Foundation. All rights reserved.