public class PickBestTextEncodingParser extends AbstractMultipleParser
The logic for "best" needs a lot of work!
This is not recommended for actual production use... It is mostly to
prove that the AbstractMultipleParser
environment is
sufficient to support this use-case
TODO Implement proper "Junk" detection
Modifier and Type | Class and Description |
---|---|
protected class |
PickBestTextEncodingParser.CharsetContentHandlerFactory
Deprecated.
|
protected class |
PickBestTextEncodingParser.CharsetTester
Deprecated.
|
AbstractMultipleParser.MetadataPolicy
METADATA_POLICY_CONFIG_KEY
Constructor and Description |
---|
PickBestTextEncodingParser(MediaTypeRegistry registry,
String[] charsets)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
void |
parse(InputStream stream,
ContentHandler handler,
Metadata originalMetadata,
ParseContext context)
Deprecated.
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
protected boolean |
parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Deprecated.
Used to notify implementations that a Parser has Finished
or Failed, and to allow them to decide to continue or
abort further parsing
|
protected void |
parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Deprecated.
Used to allow implementations to prepare or change things
before parsing occurs
|
getAllParsers, getMediaTypeRegistry, getMetadataPolicy, getMetadataPolicy, getSupportedTypes, mergeMetadata, setMediaTypeRegistry
parse
public PickBestTextEncodingParser(MediaTypeRegistry registry, String[] charsets)
protected void parserPrepare(Parser parser, Metadata metadata, ParseContext context)
AbstractMultipleParser
parserPrepare
in class AbstractMultipleParser
protected boolean parserCompleted(Parser parser, Metadata metadata, ContentHandler handler, ParseContext context, Exception exception)
AbstractMultipleParser
parserCompleted
in class AbstractMultipleParser
public void parse(InputStream stream, ContentHandler handler, Metadata originalMetadata, ParseContext context) throws IOException, SAXException, TikaException
AbstractMultipleParser
Parser
s.
Note that you'll get text from every parser this way, to have
control of which content is from which parser you need to
call the method with a ContentHandlerFactory
instead.
parse
in interface Parser
parse
in class AbstractMultipleParser
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)originalMetadata
- document metadata (input and output)context
- parse contextIOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsedpublic void parse(InputStream stream, ContentHandlerFactory handlers, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
AbstractMultipleParser
Parser
s.
You will get one ContentHandler fetched for each Parser used.
TODO Do we need to return all the ContentHandler instances we created?parse
in class AbstractMultipleParser
IOException
SAXException
TikaException
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.