org.apache.tika.parser
Class CompositeParser

java.lang.Object
  extended by org.apache.tika.parser.CompositeParser
All Implemented Interfaces:
java.io.Serializable, Parser
Direct Known Subclasses:
AutoDetectParser, DefaultParser

public class CompositeParser
extends java.lang.Object
implements Parser

Composite parser that delegates parsing tasks to a component parser based on the declared content type of the incoming document. A fallback parser is defined for cases where a parser for the given content type is not available.

See Also:
Serialized Form

Constructor Summary
CompositeParser()
           
CompositeParser(MediaTypeRegistry registry, java.util.List<Parser> parsers)
           
CompositeParser(MediaTypeRegistry registry, Parser... parsers)
           
 
Method Summary
 Parser getFallback()
          Returns the fallback parser.
 MediaTypeRegistry getMediaTypeRegistry()
          Returns the media type registry used to infer type relationships.
protected  Parser getParser(Metadata metadata)
          Returns the parser that best matches the given metadata.
protected  Parser getParser(Metadata metadata, ParseContext context)
           
 java.util.Map<MediaType,Parser> getParsers()
          Returns the component parsers.
 java.util.Map<MediaType,Parser> getParsers(ParseContext context)
           
 java.util.Set<MediaType> getSupportedTypes(ParseContext context)
          Returns the set of media types supported by this parser when used with the given parse context.
 void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, Metadata metadata)
          Deprecated. This method will be removed in Apache Tika 1.0.
 void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context)
          Delegates the call to the matching component parser.
 void setFallback(Parser fallback)
          Sets the fallback parser.
 void setMediaTypeRegistry(MediaTypeRegistry registry)
          Sets the media type registry used to infer type relationships.
 void setParsers(java.util.Map<MediaType,Parser> parsers)
          Sets the component parsers.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CompositeParser

public CompositeParser(MediaTypeRegistry registry,
                       java.util.List<Parser> parsers)

CompositeParser

public CompositeParser(MediaTypeRegistry registry,
                       Parser... parsers)

CompositeParser

public CompositeParser()
Method Detail

getParsers

public java.util.Map<MediaType,Parser> getParsers(ParseContext context)

getMediaTypeRegistry

public MediaTypeRegistry getMediaTypeRegistry()
Returns the media type registry used to infer type relationships.

Returns:
media type registry
Since:
Apache Tika 0.8

setMediaTypeRegistry

public void setMediaTypeRegistry(MediaTypeRegistry registry)
Sets the media type registry used to infer type relationships.

Parameters:
registry - media type registry
Since:
Apache Tika 0.8

getParsers

public java.util.Map<MediaType,Parser> getParsers()
Returns the component parsers.

Returns:
component parsers, keyed by media type

setParsers

public void setParsers(java.util.Map<MediaType,Parser> parsers)
Sets the component parsers.

Parameters:
parsers - component parsers, keyed by media type

getFallback

public Parser getFallback()
Returns the fallback parser.

Returns:
fallback parser

setFallback

public void setFallback(Parser fallback)
Sets the fallback parser.

Parameters:
fallback - fallback parser

getParser

protected Parser getParser(Metadata metadata)
Returns the parser that best matches the given metadata. By default looks for a parser that matches the content type metadata property, and uses the fallback parser if a better match is not found. The type hierarchy information included in the configured media type registry is used when looking for a matching parser instance.

Subclasses can override this method to provide more accurate parser resolution.

Parameters:
metadata - document metadata
Returns:
matching parser

getParser

protected Parser getParser(Metadata metadata,
                           ParseContext context)

getSupportedTypes

public java.util.Set<MediaType> getSupportedTypes(ParseContext context)
Description copied from interface: Parser
Returns the set of media types supported by this parser when used with the given parse context.

Specified by:
getSupportedTypes in interface Parser
Parameters:
context - parse context
Returns:
immutable set of media types

parse

public void parse(java.io.InputStream stream,
                  org.xml.sax.ContentHandler handler,
                  Metadata metadata,
                  ParseContext context)
           throws java.io.IOException,
                  org.xml.sax.SAXException,
                  TikaException
Delegates the call to the matching component parser.

Potential RuntimeExceptions, IOExceptions and SAXExceptions unrelated to the given input stream and content handler are automatically wrapped into TikaExceptions to better honor the Parser contract.

Specified by:
parse in interface Parser
Parameters:
stream - the document stream (input)
handler - handler for the XHTML SAX events (output)
metadata - document metadata (input and output)
context - parse context
Throws:
java.io.IOException - if the document stream could not be read
org.xml.sax.SAXException - if the SAX events could not be processed
TikaException - if the document could not be parsed

parse

public void parse(java.io.InputStream stream,
                  org.xml.sax.ContentHandler handler,
                  Metadata metadata)
           throws java.io.IOException,
                  org.xml.sax.SAXException,
                  TikaException
Deprecated. This method will be removed in Apache Tika 1.0.

Description copied from interface: Parser
The parse() method from Tika 0.4 and earlier. Please use the Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead in new code. Calls to this backwards compatibility method are forwarded to the new parse() method with an empty parse context.

Specified by:
parse in interface Parser
Throws:
java.io.IOException
org.xml.sax.SAXException
TikaException


Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.