Class PipesForkResult

java.lang.Object
org.apache.tika.pipes.fork.PipesForkResult

public class PipesForkResult extends Object
Result from parsing a file with PipesForkParser.

This wraps the PipesResult and provides convenient access to the parsed content and metadata.

Content is available in the metadata via TikaCoreProperties.TIKA_CONTENT.

Important - Accessing Results:

  • RMETA mode (default): Use getMetadataList() to access content and metadata from the container document AND all embedded documents. The convenience methods getContent() and getMetadata() only return the container document's data - embedded document content will be missed!
  • CONCATENATE mode: Include only metadata from the container document, but concatenated content from the container document and all attachments.
  • Constructor Details

    • PipesForkResult

      public PipesForkResult(PipesResult pipesResult)
  • Method Details

    • getStatus

      public PipesResult.RESULT_STATUS getStatus()
      Get the result status.
      Returns:
      the result status
    • isSuccess

      public boolean isSuccess()
      Check if the parsing was successful.
      Returns:
      true if parsing succeeded
    • isProcessCrash

      public boolean isProcessCrash()
      Check if there was a process crash (OOM, timeout, etc.).
      Returns:
      true if the forked process crashed
    • isFatal

      public boolean isFatal()
      Check if there was a fatal error (failed to initialize pipes system).
      Returns:
      true if there was a fatal error
    • isInitializationFailure

      public boolean isInitializationFailure()
      Check if there was an initialization failure (fetcher/emitter initialization issues).
      Returns:
      true if there was an initialization failure
    • isTaskException

      public boolean isTaskException()
      Check if there was a task exception (fetch/emit/parse issues for a specific request).
      Returns:
      true if there was a task exception
    • getMetadataList

      public List<Metadata> getMetadataList()
      Get the list of metadata objects from parsing.

      This is the recommended method for RMETA mode (the default).

      RMETA mode: Returns one metadata object per document - the first is the container document, followed by each embedded document. Each metadata object contains:

      CONCATENATE mode: Returns a single metadata object containing the container's metadata and concatenated content from all documents.

      Returns:
      the list of metadata objects, or empty list if none
    • getContent

      public String getContent()
      Get the content from the container document only.

      WARNING - RMETA mode: In RMETA mode, this returns ONLY the container document's content. Content from embedded documents is NOT included. To get all content including embedded documents, iterate over getMetadataList() and retrieve TikaCoreProperties.TIKA_CONTENT from each metadata object.

      CONCATENATE mode: In CONCATENATE mode, this returns all content (container + embedded) since everything is concatenated into a single metadata object. This method works as expected in CONCATENATE mode.

      Recommendation: For RMETA mode (the default), use getMetadataList() to access content from all documents. This method is most appropriate for CONCATENATE mode or when you only need the container document's content.

      Returns:
      the container document's content, or null if not available
      See Also:
    • getMetadata

      public Metadata getMetadata()
      Get the container document's metadata only.

      WARNING - RMETA mode: In RMETA mode, this returns ONLY the container document's metadata. Metadata from embedded documents (including their content, titles, authors, and any parse exceptions) is NOT included. To access metadata from all documents, use getMetadataList().

      CONCATENATE mode: In CONCATENATE mode, there is only one metadata object containing the container's metadata and concatenated content from all documents. By the nature of CONCATENATE mode, you are losing metadata from embedded files, and Tika is silently swallowing exceptions in embedded files.

      Recommendation: For RMETA mode (the default), use getMetadataList() to access metadata from all documents, including embedded document exceptions (stored in TikaCoreProperties.EMBEDDED_EXCEPTION).

      Returns:
      the container document's metadata, or null if not available
      See Also:
    • getMessage

      public String getMessage()
      Get any error message associated with the result.
      Returns:
      the error message, or null if none
    • getPipesResult

      public PipesResult getPipesResult()
      Get the underlying PipesResult for advanced access.
      Returns:
      the pipes result
    • toString

      public String toString()
      Overrides:
      toString in class Object