Package org.apache.tika.extractor
Class ParsingEmbeddedDocumentExtractor
- java.lang.Object
-
- org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor
-
- All Implemented Interfaces:
EmbeddedDocumentExtractor
- Direct Known Subclasses:
RUnpackExtractor
public class ParsingEmbeddedDocumentExtractor extends Object implements EmbeddedDocumentExtractor
Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.- Since:
- Apache Tika 0.8
-
-
Field Summary
Fields Modifier and Type Field Description protected ParseContext
context
-
Constructor Summary
Constructors Constructor Description ParsingEmbeddedDocumentExtractor(ParseContext context)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Parser
getDelegatingParser()
boolean
isWriteFileNameToContent()
void
parseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml)
Processes the supplied embedded resource, calling the delegating parser with the appropriate details.void
setWriteFileNameToContent(boolean writeFileNameToContent)
boolean
shouldParseEmbedded(Metadata metadata)
-
-
-
Field Detail
-
context
protected final ParseContext context
-
-
Constructor Detail
-
ParsingEmbeddedDocumentExtractor
public ParsingEmbeddedDocumentExtractor(ParseContext context)
-
-
Method Detail
-
shouldParseEmbedded
public boolean shouldParseEmbedded(Metadata metadata)
- Specified by:
shouldParseEmbedded
in interfaceEmbeddedDocumentExtractor
-
parseEmbedded
public void parseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml) throws SAXException, IOException
Description copied from interface:EmbeddedDocumentExtractor
Processes the supplied embedded resource, calling the delegating parser with the appropriate details.- Specified by:
parseEmbedded
in interfaceEmbeddedDocumentExtractor
- Parameters:
stream
- The embedded resourcehandler
- The handler to usemetadata
- The metadata for the embedded resourceoutputHtml
- Should we output HTML for this resource, or has the parser already done so?- Throws:
SAXException
IOException
-
getDelegatingParser
public Parser getDelegatingParser()
-
setWriteFileNameToContent
public void setWriteFileNameToContent(boolean writeFileNameToContent)
-
isWriteFileNameToContent
public boolean isWriteFileNameToContent()
-
-