Package org.apache.tika.parser
Class CompositeParser
java.lang.Object
org.apache.tika.parser.CompositeParser
- All Implemented Interfaces:
- Serializable,- Parser
- Direct Known Subclasses:
- AutoDetectParser,- CompositeExternalParser,- DefaultParser
Composite parser that delegates parsing tasks to a component parser
 based on the declared content type of the incoming document. A fallback
 parser is defined for cases where a parser for the given content type is
 not available.
- See Also:
- 
Constructor SummaryConstructorsConstructorDescriptionCompositeParser(MediaTypeRegistry registry, List<Parser> parsers) CompositeParser(MediaTypeRegistry registry, List<Parser> parsers, Collection<Class<? extends Parser>> excludeParsers) CompositeParser(MediaTypeRegistry registry, Parser... parsers) 
- 
Method SummaryModifier and TypeMethodDescriptionfindDuplicateParsers(ParseContext context) Utility method that goes through all the component parsers and finds all media types for which more than one parser declares support.Returns all parsers registered with the Composite Parser, including ones which may not currently be active.Returns the fallback parser.Returns the media type registry used to infer type relationships.protected ParserReturns the parser that best matches the given metadata.protected ParsergetParser(Metadata metadata, ParseContext context) Returns the component parsers.getParsers(ParseContext context) getSupportedTypes(ParseContext context) Returns the set of media types supported by this parser when used with the given parse context.voidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) Delegates the call to the matching component parser.voidsetFallback(Parser fallback) Sets the fallback parser.voidsetMediaTypeRegistry(MediaTypeRegistry registry) Sets the media type registry used to infer type relationships.voidsetParsers(Map<MediaType, Parser> parsers) Sets the component parsers.
- 
Constructor Details- 
CompositeParserpublic CompositeParser(MediaTypeRegistry registry, List<Parser> parsers, Collection<Class<? extends Parser>> excludeParsers) 
- 
CompositeParser
- 
CompositeParser
- 
CompositeParserpublic CompositeParser()
 
- 
- 
Method Details- 
getParsers
- 
findDuplicateParsersUtility method that goes through all the component parsers and finds all media types for which more than one parser declares support. This is useful in tracking down conflicting parser definitions.- Parameters:
- context- parsing context
- Returns:
- media types that are supported by at least two component parsers
- Since:
- Apache Tika 0.10
- See Also:
 
- 
getMediaTypeRegistryReturns the media type registry used to infer type relationships.- Returns:
- media type registry
- Since:
- Apache Tika 0.8
 
- 
setMediaTypeRegistrySets the media type registry used to infer type relationships.- Parameters:
- registry- media type registry
- Since:
- Apache Tika 0.8
 
- 
getAllComponentParsersReturns all parsers registered with the Composite Parser, including ones which may not currently be active. This won't include the Fallback Parser, if defined
- 
getParsersReturns the component parsers.- Returns:
- component parsers, keyed by media type
 
- 
setParsersSets the component parsers.- Parameters:
- parsers- component parsers, keyed by media type
 
- 
getFallbackReturns the fallback parser.- Returns:
- fallback parser
 
- 
setFallbackSets the fallback parser.- Parameters:
- fallback- fallback parser
 
- 
getParserReturns the parser that best matches the given metadata. By default looks for a parser that matches the content type metadata property, and uses the fallback parser if a better match is not found. The type hierarchy information included in the configured media type registry is used when looking for a matching parser instance.Subclasses can override this method to provide more accurate parser resolution. - Parameters:
- metadata- document metadata
- Returns:
- matching parser
 
- 
getParser
- 
getSupportedTypesDescription copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Specified by:
- getSupportedTypesin interface- Parser
- Parameters:
- context- parse context
- Returns:
- immutable set of media types
 
- 
parsepublic void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Delegates the call to the matching component parser.Potential RuntimeExceptions,IOExceptions andSAXExceptions unrelated to the given input stream and content handler are automatically wrapped intoTikaExceptions to better honor theParsercontract.- Specified by:
- parsein interface- Parser
- Parameters:
- stream- the document stream (input)
- handler- handler for the XHTML SAX events (output)
- metadata- document metadata (input and output)
- context- parse context
- Throws:
- IOException- if the document stream could not be read
- SAXException- if the SAX events could not be processed
- TikaException- if the document could not be parsed
 
 
-