Interface EmbeddedDocumentExtractor

All Known Implementing Classes:
ParsingEmbeddedDocumentExtractor, UnpackExtractor

public interface EmbeddedDocumentExtractor
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    parseEmbedded(TikaInputStream stream, ContentHandler handler, Metadata metadata, ParseContext parseContext, boolean outputHtml)
    Processes the supplied embedded resource, calling the delegating parser with the appropriate details.
    boolean
    Determines whether the given embedded document should be parsed.
  • Method Details

    • shouldParseEmbedded

      boolean shouldParseEmbedded(Metadata metadata)
      Determines whether the given embedded document should be parsed.

      Note: Implementations may throw EmbeddedLimitReachedException (a RuntimeException) if a limit is exceeded and throwing is configured.

      Parameters:
      metadata - the metadata for the embedded document
      Returns:
      true if the embedded document should be parsed
    • parseEmbedded

      void parseEmbedded(TikaInputStream stream, ContentHandler handler, Metadata metadata, ParseContext parseContext, boolean outputHtml) throws SAXException, IOException
      Processes the supplied embedded resource, calling the delegating parser with the appropriate details.
      Parameters:
      stream - The embedded resource
      handler - The handler to use
      metadata - The metadata for the embedded resource
      parseContext - The parse context
      outputHtml - Should we output HTML for this resource, or has the parser already done so?
      Throws:
      SAXException
      IOException