Interface EmbeddedDocumentExtractor

All Known Implementing Classes:
ParsingEmbeddedDocumentExtractor, RUnpackExtractor

public interface EmbeddedDocumentExtractor
  • Method Details

    • shouldParseEmbedded

      boolean shouldParseEmbedded(Metadata metadata)
    • parseEmbedded

      void parseEmbedded(InputStream stream, ContentHandler handler, Metadata metadata, boolean outputHtml) throws SAXException, IOException
      Processes the supplied embedded resource, calling the delegating parser with the appropriate details.
      Parameters:
      stream - The embedded resource
      handler - The handler to use
      metadata - The metadata for the embedded resource
      outputHtml - Should we output HTML for this resource, or has the parser already done so?
      Throws:
      SAXException
      IOException