Package org.apache.tika.parser
Class AutoDetectParser
- java.lang.Object
-
- org.apache.tika.parser.AbstractParser
-
- org.apache.tika.parser.CompositeParser
-
- org.apache.tika.parser.AutoDetectParser
-
- All Implemented Interfaces:
Serializable,Parser
public class AutoDetectParser extends CompositeParser
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description AutoDetectParser()Creates an auto-detecting parser instance using the default Tika configuration.AutoDetectParser(TikaConfig config)AutoDetectParser(Detector detector)AutoDetectParser(Detector detector, Parser... parsers)AutoDetectParser(Parser... parsers)Creates an auto-detecting parser instance using the specified set of parser.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AutoDetectParserConfiggetAutoDetectParserConfig()DetectorgetDetector()Returns the type detector used by this parser to auto-detect the type of a document.voidparse(InputStream stream, ContentHandler handler, Metadata metadata)Calls theParser.parse(InputStream, ContentHandler, Metadata, ParseContext)method with an emptyParseContext.voidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)Delegates the call to the matching component parser.voidsetAutoDetectParserConfig(AutoDetectParserConfig autoDetectParserConfig)Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.voidsetDetector(Detector detector)Sets the type detector used by this parser to auto-detect the type of a document.-
Methods inherited from class org.apache.tika.parser.CompositeParser
findDuplicateParsers, getAllComponentParsers, getFallback, getMediaTypeRegistry, getParser, getParser, getParsers, getParsers, getSupportedTypes, setFallback, setMediaTypeRegistry, setParsers
-
-
-
-
Constructor Detail
-
AutoDetectParser
public AutoDetectParser()
Creates an auto-detecting parser instance using the default Tika configuration.
-
AutoDetectParser
public AutoDetectParser(Detector detector)
-
AutoDetectParser
public AutoDetectParser(Parser... parsers)
Creates an auto-detecting parser instance using the specified set of parser. This allows one to create a Tika configuration where only a subset of the available parsers have their 3rd party jars included, as otherwise the use of the default TikaConfig will throw various "ClassNotFound" exceptions.- Parameters:
parsers-
-
AutoDetectParser
public AutoDetectParser(TikaConfig config)
-
-
Method Detail
-
getDetector
public Detector getDetector()
Returns the type detector used by this parser to auto-detect the type of a document.- Returns:
- type detector
- Since:
- Apache Tika 0.4
-
setDetector
public void setDetector(Detector detector)
Sets the type detector used by this parser to auto-detect the type of a document.- Parameters:
detector- type detector- Since:
- Apache Tika 0.4
-
setAutoDetectParserConfig
public void setAutoDetectParserConfig(AutoDetectParserConfig autoDetectParserConfig)
Sets the configuration that will be used to create SecureContentHandlers that will be used for parsing.- Parameters:
autoDetectParserConfig- type SecureContentHandlerConfig- Since:
- Apache Tika 2.1.1
-
getAutoDetectParserConfig
public AutoDetectParserConfig getAutoDetectParserConfig()
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
Description copied from class:CompositeParserDelegates the call to the matching component parser.Potential
RuntimeExceptions,IOExceptions andSAXExceptions unrelated to the given input stream and content handler are automatically wrapped intoTikaExceptions to better honor theParsercontract.- Specified by:
parsein interfaceParser- Overrides:
parsein classCompositeParser- Parameters:
stream- the document stream (input)handler- handler for the XHTML SAX events (output)metadata- document metadata (input and output)context- parse context- Throws:
IOException- if the document stream could not be readSAXException- if the SAX events could not be processedTikaException- if the document could not be parsed
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata) throws IOException, SAXException, TikaException
Description copied from class:AbstractParserCalls theParser.parse(InputStream, ContentHandler, Metadata, ParseContext)method with an emptyParseContext. This method exists as a leftover from Tika 0.x when the three-argument parse() method still existed in theParserinterface. No new code should call this method anymore, it's only here for backwards compatibility.- Overrides:
parsein classAbstractParser- Throws:
IOExceptionSAXExceptionTikaException
-
-