Package org.apache.tika.extractor
Interface EmbeddedDocumentExtractor
- All Known Implementing Classes:
ParsingEmbeddedDocumentExtractor,UnpackExtractor
public interface EmbeddedDocumentExtractor
-
Method Summary
Modifier and TypeMethodDescriptionvoidparseEmbedded(TikaInputStream stream, ContentHandler handler, Metadata metadata, ParseContext parseContext, boolean outputHtml) Processes the supplied embedded resource, calling the delegating parser with the appropriate details.booleanshouldParseEmbedded(Metadata metadata) Determines whether the given embedded document should be parsed.
-
Method Details
-
shouldParseEmbedded
Determines whether the given embedded document should be parsed.Note: Implementations may throw
EmbeddedLimitReachedException(a RuntimeException) if a limit is exceeded and throwing is configured.- Parameters:
metadata- the metadata for the embedded document- Returns:
- true if the embedded document should be parsed
-
parseEmbedded
void parseEmbedded(TikaInputStream stream, ContentHandler handler, Metadata metadata, ParseContext parseContext, boolean outputHtml) throws SAXException, IOException Processes the supplied embedded resource, calling the delegating parser with the appropriate details.- Parameters:
stream- The embedded resourcehandler- The handler to usemetadata- The metadata for the embedded resourceparseContext- The parse contextoutputHtml- Should we output HTML for this resource, or has the parser already done so?- Throws:
SAXExceptionIOException
-