public class PickBestTextEncodingParser extends AbstractMultipleParser
The logic for "best" needs a lot of work!
This is not recommended for actual production use... It is mostly to
prove that the AbstractMultipleParser environment is
sufficient to support this use-case
TODO Implement proper "Junk" detection
| Modifier and Type | Class and Description |
|---|---|
protected class |
PickBestTextEncodingParser.CharsetContentHandlerFactory
Deprecated.
|
protected class |
PickBestTextEncodingParser.CharsetTester
Deprecated.
|
AbstractMultipleParser.MetadataPolicyMETADATA_POLICY_CONFIG_KEY| Constructor and Description |
|---|
PickBestTextEncodingParser(MediaTypeRegistry registry,
String[] charsets)
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
void |
parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
void |
parse(InputStream stream,
ContentHandler handler,
Metadata originalMetadata,
ParseContext context)
Deprecated.
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
protected boolean |
parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Deprecated.
Used to notify implementations that a Parser has Finished
or Failed, and to allow them to decide to continue or
abort further parsing
|
protected void |
parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Deprecated.
Used to allow implementations to prepare or change things
before parsing occurs
|
getAllParsers, getMediaTypeRegistry, getMetadataPolicy, getMetadataPolicy, getSupportedTypes, mergeMetadata, setMediaTypeRegistryparsepublic PickBestTextEncodingParser(MediaTypeRegistry registry, String[] charsets)
protected void parserPrepare(Parser parser, Metadata metadata, ParseContext context)
AbstractMultipleParserparserPrepare in class AbstractMultipleParserprotected boolean parserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception)
AbstractMultipleParserparserCompleted in class AbstractMultipleParserpublic void parse(InputStream stream, ContentHandler handler, Metadata originalMetadata, ParseContext context) throws IOException, SAXException, TikaException
AbstractMultipleParserParsers.
Note that you'll get text from every parser this way, to have
control of which content is from which parser you need to
call the method with a ContentHandlerFactory instead.
parse in interface Parserparse in class AbstractMultipleParserstream - the document stream (input)handler - handler for the XHTML SAX events (output)originalMetadata - document metadata (input and output)context - parse contextIOException - if the document stream could not be readSAXException - if the SAX events could not be processedTikaException - if the document could not be parsedpublic void parse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
AbstractMultipleParserParsers.
You will get one ContentHandler fetched for each Parser used.
TODO Do we need to return all the ContentHandler instances we created?parse in class AbstractMultipleParserIOExceptionSAXExceptionTikaExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.