Package org.apache.tika.parser
Class AutoDetectParser
- java.lang.Object
-
- org.apache.tika.parser.AbstractParser
-
- org.apache.tika.parser.CompositeParser
-
- org.apache.tika.parser.AutoDetectParser
-
- All Implemented Interfaces:
Serializable
,Parser
public class AutoDetectParser extends CompositeParser
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description AutoDetectParser()
Creates an auto-detecting parser instance using the default Tika configuration.AutoDetectParser(TikaConfig config)
AutoDetectParser(Detector detector)
AutoDetectParser(Detector detector, Parser... parsers)
AutoDetectParser(Parser... parsers)
Creates an auto-detecting parser instance using the specified set of parser.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Detector
getDetector()
Returns the type detector used by this parser to auto-detect the type of a document.void
parse(InputStream stream, ContentHandler handler, Metadata metadata)
Calls theParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
method with an emptyParseContext
.void
parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)
Delegates the call to the matching component parser.void
setAutoDetectParserConfig(AutoDetectParserConfig autoDetectParserConfig)
Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.void
setDetector(Detector detector)
Sets the type detector used by this parser to auto-detect the type of a document.-
Methods inherited from class org.apache.tika.parser.CompositeParser
findDuplicateParsers, getAllComponentParsers, getFallback, getMediaTypeRegistry, getParser, getParser, getParsers, getParsers, getSupportedTypes, setFallback, setMediaTypeRegistry, setParsers
-
-
-
-
Constructor Detail
-
AutoDetectParser
public AutoDetectParser()
Creates an auto-detecting parser instance using the default Tika configuration.
-
AutoDetectParser
public AutoDetectParser(Detector detector)
-
AutoDetectParser
public AutoDetectParser(Parser... parsers)
Creates an auto-detecting parser instance using the specified set of parser. This allows one to create a Tika configuration where only a subset of the available parsers have their 3rd party jars included, as otherwise the use of the default TikaConfig will throw various "ClassNotFound" exceptions.- Parameters:
parsers
-
-
AutoDetectParser
public AutoDetectParser(TikaConfig config)
-
-
Method Detail
-
getDetector
public Detector getDetector()
Returns the type detector used by this parser to auto-detect the type of a document.- Returns:
- type detector
- Since:
- Apache Tika 0.4
-
setDetector
public void setDetector(Detector detector)
Sets the type detector used by this parser to auto-detect the type of a document.- Parameters:
detector
- type detector- Since:
- Apache Tika 0.4
-
setAutoDetectParserConfig
public void setAutoDetectParserConfig(AutoDetectParserConfig autoDetectParserConfig)
Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.- Parameters:
autoDetectParserConfig
- type SecureContentHandlerConfig- Since:
- Apache Tika 2.1.1
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
Description copied from class:CompositeParser
Delegates the call to the matching component parser.Potential
RuntimeException
s,IOException
s andSAXException
s unrelated to the given input stream and content handler are automatically wrapped intoTikaException
s to better honor theParser
contract.- Specified by:
parse
in interfaceParser
- Overrides:
parse
in classCompositeParser
- Parameters:
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context- Throws:
IOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsed
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata) throws IOException, SAXException, TikaException
Description copied from class:AbstractParser
Calls theParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
method with an emptyParseContext
. This method exists as a leftover from Tika 0.x when the three-argument parse() method still existed in theParser
interface. No new code should call this method anymore, it's only here for backwards compatibility.- Overrides:
parse
in classAbstractParser
- Throws:
IOException
SAXException
TikaException
-
-