Package org.apache.tika.parser.multiple
Class AbstractMultipleParser
- java.lang.Object
- 
- org.apache.tika.parser.multiple.AbstractMultipleParser
 
- 
- All Implemented Interfaces:
- Serializable,- Parser
 - Direct Known Subclasses:
- FallbackParser,- PickBestTextEncodingParser,- SupplementingParser
 
 public abstract class AbstractMultipleParser extends Object implements Parser Abstract base class for parser wrappers which may / will process a given stream multiple times, merging the results of the various parsers used. End users should normally useFallbackParserorSupplementingParseralong with a Strategy. Note that unless you give aContentHandlerFactory, you'll get content from every parser tried mushed together!- Since:
- Apache Tika 1.18
- See Also:
- Serialized Form
 
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classAbstractMultipleParser.MetadataPolicyThe various strategies for handling metadata emitted by multiple parsers.
 - 
Field SummaryFields Modifier and Type Field Description protected static StringMETADATA_POLICY_CONFIG_KEY
 - 
Constructor SummaryConstructors Constructor Description AbstractMultipleParser(MediaTypeRegistry registry, Collection<? extends Parser> parsers, Map<String,Param> params)AbstractMultipleParser(MediaTypeRegistry registry, AbstractMultipleParser.MetadataPolicy policy, Collection<? extends Parser> parsers)AbstractMultipleParser(MediaTypeRegistry registry, AbstractMultipleParser.MetadataPolicy policy, Parser... parsers)
 - 
Method SummaryAll Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description List<Parser>getAllParsers()MediaTypeRegistrygetMediaTypeRegistry()Returns the media type registry used to infer type relationships.AbstractMultipleParser.MetadataPolicygetMetadataPolicy()protected static AbstractMultipleParser.MetadataPolicygetMetadataPolicy(Map<String,Param> params)Set<MediaType>getSupportedTypes(ParseContext context)Returns the set of media types supported by this parser when used with the given parse context.protected static MetadatamergeMetadata(Metadata newMetadata, Metadata lastMetadata, AbstractMultipleParser.MetadataPolicy policy)voidparse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context)Deprecated.TheContentHandlerFactoryoverride is still experimental and the method signature is subject to change before Tika 2.0voidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)Processes the given Stream through one or more parsers, resetting things between parsers as requested by policy.protected abstract booleanparserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception)Used to notify implementations that a Parser has Finished or Failed, and to allow them to decide to continue or abort further parsingprotected voidparserPrepare(Parser parser, Metadata metadata, ParseContext context)Used to allow implementations to prepare or change things before parsing occursvoidsetMediaTypeRegistry(MediaTypeRegistry registry)Sets the media type registry used to infer type relationships.
 
- 
- 
- 
Field Detail- 
METADATA_POLICY_CONFIG_KEYprotected static final String METADATA_POLICY_CONFIG_KEY - See Also:
- Constant Field Values
 
 
- 
 - 
Constructor Detail- 
AbstractMultipleParserpublic AbstractMultipleParser(MediaTypeRegistry registry, Collection<? extends Parser> parsers, Map<String,Param> params) 
 - 
AbstractMultipleParserpublic AbstractMultipleParser(MediaTypeRegistry registry, AbstractMultipleParser.MetadataPolicy policy, Parser... parsers) 
 - 
AbstractMultipleParserpublic AbstractMultipleParser(MediaTypeRegistry registry, AbstractMultipleParser.MetadataPolicy policy, Collection<? extends Parser> parsers) 
 
- 
 - 
Method Detail- 
getMetadataPolicyprotected static AbstractMultipleParser.MetadataPolicy getMetadataPolicy(Map<String,Param> params) 
 - 
mergeMetadataprotected static Metadata mergeMetadata(Metadata newMetadata, Metadata lastMetadata, AbstractMultipleParser.MetadataPolicy policy) 
 - 
getMediaTypeRegistrypublic MediaTypeRegistry getMediaTypeRegistry() Returns the media type registry used to infer type relationships.- Returns:
- media type registry
 
 - 
setMediaTypeRegistrypublic void setMediaTypeRegistry(MediaTypeRegistry registry) Sets the media type registry used to infer type relationships.- Parameters:
- registry- media type registry
 
 - 
getSupportedTypespublic Set<MediaType> getSupportedTypes(ParseContext context) Description copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Specified by:
- getSupportedTypesin interface- Parser
- Parameters:
- context- parse context
- Returns:
- immutable set of media types
 
 - 
getMetadataPolicypublic AbstractMultipleParser.MetadataPolicy getMetadataPolicy() 
 - 
parserPrepareprotected void parserPrepare(Parser parser, Metadata metadata, ParseContext context) Used to allow implementations to prepare or change things before parsing occurs
 - 
parserCompletedprotected abstract boolean parserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception) Used to notify implementations that a Parser has Finished or Failed, and to allow them to decide to continue or abort further parsing
 - 
parsepublic void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Processes the given Stream through one or more parsers, resetting things between parsers as requested by policy. The actual processing is delegated to one or moreParsers.Note that you'll get text from every parser this way, to have control of which content is from which parser you need to call the method with a ContentHandlerFactoryinstead.- Specified by:
- parsein interface- Parser
- Parameters:
- stream- the document stream (input)
- handler- handler for the XHTML SAX events (output)
- metadata- document metadata (input and output)
- context- parse context
- Throws:
- IOException- if the document stream could not be read
- SAXException- if the SAX events could not be processed
- TikaException- if the document could not be parsed
 
 - 
parse@Deprecated public void parse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Deprecated.TheContentHandlerFactoryoverride is still experimental and the method signature is subject to change before Tika 2.0Processes the given Stream through one or more parsers, resetting things between parsers as requested by policy. The actual processing is delegated to one or moreParsers. You will get one ContentHandler fetched for each Parser used. TODO Do we need to return all the ContentHandler instances we created?- Throws:
- IOException
- SAXException
- TikaException
 
 
- 
 
-