|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.parser.AbstractParser org.apache.tika.parser.pdf.PDFParser
public class PDFParser
PDF parser.
This parser can process also encrypted PDF documents if the required password is given as a part of the input metadata associated with a document. If no password is given, then this parser will try decrypting the document using the empty password that's often used with PDFs.
Field Summary | |
---|---|
static String |
PASSWORD
Metadata key for giving the document password to the parser. |
Constructor Summary | |
---|---|
PDFParser()
|
Method Summary | |
---|---|
boolean |
getEnableAutoSpace()
|
boolean |
getExtractAnnotationText()
If true, text in annotations will be extracted. |
Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the given parse context. |
void |
parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events. |
void |
setEnableAutoSpace(boolean v)
If true (the default), the parser should estimate where spaces should be inserted between words. |
void |
setExtractAnnotationText(boolean v)
If true (the default), text in annotations will be extracted. |
Methods inherited from class org.apache.tika.parser.AbstractParser |
---|
parse |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String PASSWORD
Constructor Detail |
---|
public PDFParser()
Method Detail |
---|
public Set<MediaType> getSupportedTypes(ParseContext context)
Parser
context
- parse context
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
Parser
The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context
IOException
- if the document stream could not be read
SAXException
- if the SAX events could not be processed
TikaException
- if the document could not be parsedpublic void setEnableAutoSpace(boolean v)
public boolean getEnableAutoSpace()
#setEnableAutoSpace.
public void setExtractAnnotationText(boolean v)
public boolean getExtractAnnotationText()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |