Class ParsingEmbeddedDocumentExtractor

java.lang.Object
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor
All Implemented Interfaces:
EmbeddedDocumentExtractor
Direct Known Subclasses:
RUnpackExtractor

public class ParsingEmbeddedDocumentExtractor extends Object implements EmbeddedDocumentExtractor
Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.
Since:
Apache Tika 0.8
  • Field Details

  • Constructor Details

    • ParsingEmbeddedDocumentExtractor

      public ParsingEmbeddedDocumentExtractor(ParseContext context)
  • Method Details

    • shouldParseEmbedded

      public boolean shouldParseEmbedded(Metadata metadata)
      Specified by:
      shouldParseEmbedded in interface EmbeddedDocumentExtractor
    • parseEmbedded

      public void parseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml) throws SAXException, IOException
      Description copied from interface: EmbeddedDocumentExtractor
      Processes the supplied embedded resource, calling the delegating parser with the appropriate details.
      Specified by:
      parseEmbedded in interface EmbeddedDocumentExtractor
      Parameters:
      stream - The embedded resource
      handler - The handler to use
      metadata - The metadata for the embedded resource
      outputHtml - Should we output HTML for this resource, or has the parser already done so?
      Throws:
      SAXException
      IOException
    • getDelegatingParser

      public Parser getDelegatingParser()
    • setWriteFileNameToContent

      public void setWriteFileNameToContent(boolean writeFileNameToContent)
    • isWriteFileNameToContent

      public boolean isWriteFileNameToContent()