Package org.apache.tika.parser
Class AutoDetectParser
java.lang.Object
org.apache.tika.parser.CompositeParser
org.apache.tika.parser.AutoDetectParser
- All Implemented Interfaces:
Serializable,Parser
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionCreates an auto-detecting parser instance using the default Tika configuration.AutoDetectParser(TikaConfig config) AutoDetectParser(Detector detector) AutoDetectParser(Detector detector, Parser... parsers) AutoDetectParser(Parser... parsers) Creates an auto-detecting parser instance using the specified set of parser. -
Method Summary
Modifier and TypeMethodDescriptionReturns the type detector used by this parser to auto-detect the type of a document.voidparse(InputStream stream, ContentHandler handler, Metadata metadata) voidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) Delegates the call to the matching component parser.voidsetAutoDetectParserConfig(AutoDetectParserConfig autoDetectParserConfig) Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.voidsetDetector(Detector detector) Sets the type detector used by this parser to auto-detect the type of a document.Methods inherited from class org.apache.tika.parser.CompositeParser
findDuplicateParsers, getAllComponentParsers, getFallback, getMediaTypeRegistry, getParser, getParser, getParsers, getParsers, getSupportedTypes, setFallback, setMediaTypeRegistry, setParsers
-
Constructor Details
-
AutoDetectParser
public AutoDetectParser()Creates an auto-detecting parser instance using the default Tika configuration. -
AutoDetectParser
-
AutoDetectParser
Creates an auto-detecting parser instance using the specified set of parser. This allows one to create a Tika configuration where only a subset of the available parsers have their 3rd party jars included, as otherwise the use of the default TikaConfig will throw various "ClassNotFound" exceptions.- Parameters:
parsers-
-
AutoDetectParser
-
AutoDetectParser
-
-
Method Details
-
getDetector
Returns the type detector used by this parser to auto-detect the type of a document.- Returns:
- type detector
- Since:
- Apache Tika 0.4
-
setDetector
Sets the type detector used by this parser to auto-detect the type of a document.- Parameters:
detector- type detector- Since:
- Apache Tika 0.4
-
setAutoDetectParserConfig
Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.- Parameters:
autoDetectParserConfig- type SecureContentHandlerConfig- Since:
- Apache Tika 2.1.1
-
getAutoDetectParserConfig
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Description copied from class:CompositeParserDelegates the call to the matching component parser.Potential
RuntimeExceptions,IOExceptions andSAXExceptions unrelated to the given input stream and content handler are automatically wrapped intoTikaExceptions to better honor theParsercontract.- Specified by:
parsein interfaceParser- Overrides:
parsein classCompositeParser- Parameters:
stream- the document stream (input)handler- handler for the XHTML SAX events (output)metadata- document metadata (input and output)context- parse context- Throws:
IOException- if the document stream could not be readSAXException- if the SAX events could not be processedTikaException- if the document could not be parsed
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata) throws IOException, SAXException, TikaException - Throws:
IOExceptionSAXExceptionTikaException
-