Package org.apache.tika.example
Class PickBestTextEncodingParser
- java.lang.Object
- 
- org.apache.tika.parser.multiple.AbstractMultipleParser
- 
- org.apache.tika.example.PickBestTextEncodingParser
 
 
- 
- All Implemented Interfaces:
- Serializable,- Parser
 
 public class PickBestTextEncodingParser extends AbstractMultipleParser Deprecated.Currently not suitable for real use, more a demo / prototype!Inspired by TIKA-1443 and https://wiki.apache.org/tika/CompositeParserDiscussion this tries several different text encodings, then does the real text parsing based on which is "best".The logic for "best" needs a lot of work! This is not recommended for actual production use... It is mostly to prove that the AbstractMultipleParserenvironment is sufficient to support this use-caseTODO Implement proper "Junk" detection - See Also:
- Serialized Form
 
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description protected classPickBestTextEncodingParser.CharsetContentHandlerFactoryDeprecated.protected classPickBestTextEncodingParser.CharsetTesterDeprecated.- 
Nested classes/interfaces inherited from class org.apache.tika.parser.multiple.AbstractMultipleParserAbstractMultipleParser.MetadataPolicy
 
- 
 - 
Field Summary- 
Fields inherited from class org.apache.tika.parser.multiple.AbstractMultipleParserMETADATA_POLICY_CONFIG_KEY
 
- 
 - 
Constructor SummaryConstructors Constructor Description PickBestTextEncodingParser(MediaTypeRegistry registry, String[] charsets)Deprecated.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidparse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context)Deprecated.Processes the given Stream through one or more parsers, resetting things between parsers as requested by policy.voidparse(InputStream stream, ContentHandler handler, Metadata originalMetadata, ParseContext context)Deprecated.Processes the given Stream through one or more parsers, resetting things between parsers as requested by policy.protected booleanparserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception)Deprecated.Used to notify implementations that a Parser has Finished or Failed, and to allow them to decide to continue or abort further parsingprotected voidparserPrepare(Parser parser, Metadata metadata, ParseContext context)Deprecated.Used to allow implementations to prepare or change things before parsing occurs- 
Methods inherited from class org.apache.tika.parser.multiple.AbstractMultipleParsergetAllParsers, getMediaTypeRegistry, getMetadataPolicy, getMetadataPolicy, getSupportedTypes, mergeMetadata, setMediaTypeRegistry
 
- 
 
- 
- 
- 
Constructor Detail- 
PickBestTextEncodingParserpublic PickBestTextEncodingParser(MediaTypeRegistry registry, String[] charsets) Deprecated.
 
- 
 - 
Method Detail- 
parserPrepareprotected void parserPrepare(Parser parser, Metadata metadata, ParseContext context) Deprecated.Description copied from class:AbstractMultipleParserUsed to allow implementations to prepare or change things before parsing occurs- Overrides:
- parserPreparein class- AbstractMultipleParser
 
 - 
parserCompletedprotected boolean parserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception) Deprecated.Description copied from class:AbstractMultipleParserUsed to notify implementations that a Parser has Finished or Failed, and to allow them to decide to continue or abort further parsing- Specified by:
- parserCompletedin class- AbstractMultipleParser
 
 - 
parsepublic void parse(InputStream stream, ContentHandler handler, Metadata originalMetadata, ParseContext context) throws IOException, SAXException, TikaException Deprecated.Description copied from class:AbstractMultipleParserProcesses the given Stream through one or more parsers, resetting things between parsers as requested by policy. The actual processing is delegated to one or moreParsers.Note that you'll get text from every parser this way, to have control of which content is from which parser you need to call the method with a ContentHandlerFactoryinstead.- Specified by:
- parsein interface- Parser
- Overrides:
- parsein class- AbstractMultipleParser
- Parameters:
- stream- the document stream (input)
- handler- handler for the XHTML SAX events (output)
- originalMetadata- document metadata (input and output)
- context- parse context
- Throws:
- IOException- if the document stream could not be read
- SAXException- if the SAX events could not be processed
- TikaException- if the document could not be parsed
 
 - 
parsepublic void parse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Deprecated.Description copied from class:AbstractMultipleParserProcesses the given Stream through one or more parsers, resetting things between parsers as requested by policy. The actual processing is delegated to one or moreParsers. You will get one ContentHandler fetched for each Parser used. TODO Do we need to return all the ContentHandler instances we created?- Overrides:
- parsein class- AbstractMultipleParser
- Throws:
- IOException
- SAXException
- TikaException
 
 
- 
 
-