Package org.apache.tika.parser
Class AutoDetectParser
java.lang.Object
org.apache.tika.parser.CompositeParser
org.apache.tika.parser.AutoDetectParser
- All Implemented Interfaces:
Serializable
,Parser
- See Also:
-
Constructor Summary
ConstructorDescriptionCreates an auto-detecting parser instance using the default Tika configuration.AutoDetectParser
(TikaConfig config) AutoDetectParser
(Detector detector) AutoDetectParser
(Detector detector, Parser... parsers) AutoDetectParser
(Parser... parsers) Creates an auto-detecting parser instance using the specified set of parser. -
Method Summary
Modifier and TypeMethodDescriptionReturns the type detector used by this parser to auto-detect the type of a document.void
parse
(InputStream stream, ContentHandler handler, Metadata metadata) void
parse
(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) Delegates the call to the matching component parser.void
setAutoDetectParserConfig
(AutoDetectParserConfig autoDetectParserConfig) Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.void
setDetector
(Detector detector) Sets the type detector used by this parser to auto-detect the type of a document.Methods inherited from class org.apache.tika.parser.CompositeParser
findDuplicateParsers, getAllComponentParsers, getFallback, getMediaTypeRegistry, getParser, getParser, getParsers, getParsers, getSupportedTypes, setFallback, setMediaTypeRegistry, setParsers
-
Constructor Details
-
AutoDetectParser
public AutoDetectParser()Creates an auto-detecting parser instance using the default Tika configuration. -
AutoDetectParser
-
AutoDetectParser
Creates an auto-detecting parser instance using the specified set of parser. This allows one to create a Tika configuration where only a subset of the available parsers have their 3rd party jars included, as otherwise the use of the default TikaConfig will throw various "ClassNotFound" exceptions.- Parameters:
parsers
-
-
AutoDetectParser
-
AutoDetectParser
-
-
Method Details
-
getDetector
Returns the type detector used by this parser to auto-detect the type of a document.- Returns:
- type detector
- Since:
- Apache Tika 0.4
-
setDetector
Sets the type detector used by this parser to auto-detect the type of a document.- Parameters:
detector
- type detector- Since:
- Apache Tika 0.4
-
setAutoDetectParserConfig
Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.- Parameters:
autoDetectParserConfig
- type SecureContentHandlerConfig- Since:
- Apache Tika 2.1.1
-
getAutoDetectParserConfig
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Description copied from class:CompositeParser
Delegates the call to the matching component parser.Potential
RuntimeException
s,IOException
s andSAXException
s unrelated to the given input stream and content handler are automatically wrapped intoTikaException
s to better honor theParser
contract.- Specified by:
parse
in interfaceParser
- Overrides:
parse
in classCompositeParser
- Parameters:
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context- Throws:
IOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsed
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata) throws IOException, SAXException, TikaException - Throws:
IOException
SAXException
TikaException
-