Package org.apache.tika.extractor
Class ParsingEmbeddedDocumentExtractor
- java.lang.Object
-
- org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor
-
- All Implemented Interfaces:
EmbeddedDocumentExtractor
- Direct Known Subclasses:
RUnpackExtractor
public class ParsingEmbeddedDocumentExtractor extends Object implements EmbeddedDocumentExtractor
Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.- Since:
- Apache Tika 0.8
-
-
Field Summary
Fields Modifier and Type Field Description protected ParseContextcontext
-
Constructor Summary
Constructors Constructor Description ParsingEmbeddedDocumentExtractor(ParseContext context)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParsergetDelegatingParser()booleanisWriteFileNameToContent()voidparseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml)Processes the supplied embedded resource, calling the delegating parser with the appropriate details.voidsetWriteFileNameToContent(boolean writeFileNameToContent)booleanshouldParseEmbedded(Metadata metadata)
-
-
-
Field Detail
-
context
protected final ParseContext context
-
-
Constructor Detail
-
ParsingEmbeddedDocumentExtractor
public ParsingEmbeddedDocumentExtractor(ParseContext context)
-
-
Method Detail
-
shouldParseEmbedded
public boolean shouldParseEmbedded(Metadata metadata)
- Specified by:
shouldParseEmbeddedin interfaceEmbeddedDocumentExtractor
-
parseEmbedded
public void parseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml) throws SAXException, IOException
Description copied from interface:EmbeddedDocumentExtractorProcesses the supplied embedded resource, calling the delegating parser with the appropriate details.- Specified by:
parseEmbeddedin interfaceEmbeddedDocumentExtractor- Parameters:
stream- The embedded resourcehandler- The handler to usemetadata- The metadata for the embedded resourceoutputHtml- Should we output HTML for this resource, or has the parser already done so?- Throws:
SAXExceptionIOException
-
getDelegatingParser
public Parser getDelegatingParser()
-
setWriteFileNameToContent
public void setWriteFileNameToContent(boolean writeFileNameToContent)
-
isWriteFileNameToContent
public boolean isWriteFileNameToContent()
-
-