|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.AbstractParser org.apache.tika.parser.CompositeParser
public class CompositeParser
Composite parser that delegates parsing tasks to a component parser based on the declared content type of the incoming document. A fallback parser is defined for cases where a parser for the given content type is not available.
Constructor Summary | |
---|---|
CompositeParser()
|
|
CompositeParser(MediaTypeRegistry registry,
List<Parser> parsers)
|
|
CompositeParser(MediaTypeRegistry registry,
Parser... parsers)
|
Method Summary | |
---|---|
Map<MediaType,List<Parser>> |
findDuplicateParsers(ParseContext context)
Utility method that goes through all the component parsers and finds all media types for which more than one parser declares support. |
Parser |
getFallback()
Returns the fallback parser. |
MediaTypeRegistry |
getMediaTypeRegistry()
Returns the media type registry used to infer type relationships. |
protected Parser |
getParser(Metadata metadata)
Returns the parser that best matches the given metadata. |
protected Parser |
getParser(Metadata metadata,
ParseContext context)
|
Map<MediaType,Parser> |
getParsers()
Returns the component parsers. |
Map<MediaType,Parser> |
getParsers(ParseContext context)
|
Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the given parse context. |
void |
parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser. |
void |
setFallback(Parser fallback)
Sets the fallback parser. |
void |
setMediaTypeRegistry(MediaTypeRegistry registry)
Sets the media type registry used to infer type relationships. |
void |
setParsers(Map<MediaType,Parser> parsers)
Sets the component parsers. |
Methods inherited from class org.apache.tika.parser.AbstractParser |
---|
parse |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public CompositeParser(MediaTypeRegistry registry, List<Parser> parsers)
public CompositeParser(MediaTypeRegistry registry, Parser... parsers)
public CompositeParser()
Method Detail |
---|
public Map<MediaType,Parser> getParsers(ParseContext context)
public Map<MediaType,List<Parser>> findDuplicateParsers(ParseContext context)
context
- parsing context
public MediaTypeRegistry getMediaTypeRegistry()
public void setMediaTypeRegistry(MediaTypeRegistry registry)
registry
- media type registrypublic Map<MediaType,Parser> getParsers()
public void setParsers(Map<MediaType,Parser> parsers)
parsers
- component parsers, keyed by media typepublic Parser getFallback()
public void setFallback(Parser fallback)
fallback
- fallback parserprotected Parser getParser(Metadata metadata)
Subclasses can override this method to provide more accurate parser resolution.
metadata
- document metadata
protected Parser getParser(Metadata metadata, ParseContext context)
public Set<MediaType> getSupportedTypes(ParseContext context)
Parser
context
- parse context
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
Potential RuntimeException
s, IOException
s and
SAXException
s unrelated to the given input stream and content
handler are automatically wrapped into TikaException
s to better
honor the Parser
contract.
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context
IOException
- if the document stream could not be read
SAXException
- if the SAX events could not be processed
TikaException
- if the document could not be parsed
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |