Class PipesForkResult
PipesForkParser.
This wraps the PipesResult and provides convenient access to
the parsed content and metadata.
Content is available in the metadata via TikaCoreProperties.TIKA_CONTENT.
Important - Accessing Results:
- RMETA mode (default): Use
getMetadataList()to access content and metadata from the container document AND all embedded documents. The convenience methodsgetContent()andgetMetadata()only return the container document's data - embedded document content will be missed! - CONCATENATE mode: Include only metadata from the container document, but concatenated content from the container document and all attachments.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionGet the content from the container document only.Get any error message associated with the result.Get the container document's metadata only.Get the list of metadata objects from parsing.Get the underlying PipesResult for advanced access.Get the result status.booleanisFatal()Check if there was a fatal error (failed to initialize pipes system).booleanCheck if there was an initialization failure (fetcher/emitter initialization issues).booleanCheck if there was a process crash (OOM, timeout, etc.).booleanCheck if the parsing was successful.booleanCheck if there was a task exception (fetch/emit/parse issues for a specific request).toString()
-
Constructor Details
-
PipesForkResult
-
-
Method Details
-
getStatus
Get the result status.- Returns:
- the result status
-
isSuccess
public boolean isSuccess()Check if the parsing was successful.- Returns:
- true if parsing succeeded
-
isProcessCrash
public boolean isProcessCrash()Check if there was a process crash (OOM, timeout, etc.).- Returns:
- true if the forked process crashed
-
isFatal
public boolean isFatal()Check if there was a fatal error (failed to initialize pipes system).- Returns:
- true if there was a fatal error
-
isInitializationFailure
public boolean isInitializationFailure()Check if there was an initialization failure (fetcher/emitter initialization issues).- Returns:
- true if there was an initialization failure
-
isTaskException
public boolean isTaskException()Check if there was a task exception (fetch/emit/parse issues for a specific request).- Returns:
- true if there was a task exception
-
getMetadataList
Get the list of metadata objects from parsing.This is the recommended method for RMETA mode (the default).
RMETA mode: Returns one metadata object per document - the first is the container document, followed by each embedded document. Each metadata object contains:
- Content via
TikaCoreProperties.TIKA_CONTENT - Document metadata (title, author, dates, etc.)
- Any parse exceptions via
TikaCoreProperties.EMBEDDED_EXCEPTION
CONCATENATE mode: Returns a single metadata object containing the container's metadata and concatenated content from all documents.
- Returns:
- the list of metadata objects, or empty list if none
- Content via
-
getContent
Get the content from the container document only.WARNING - RMETA mode: In RMETA mode, this returns ONLY the container document's content. Content from embedded documents is NOT included. To get all content including embedded documents, iterate over
getMetadataList()and retrieveTikaCoreProperties.TIKA_CONTENTfrom each metadata object.CONCATENATE mode: In CONCATENATE mode, this returns all content (container + embedded) since everything is concatenated into a single metadata object. This method works as expected in CONCATENATE mode.
Recommendation: For RMETA mode (the default), use
getMetadataList()to access content from all documents. This method is most appropriate for CONCATENATE mode or when you only need the container document's content.- Returns:
- the container document's content, or null if not available
- See Also:
-
getMetadata
Get the container document's metadata only.WARNING - RMETA mode: In RMETA mode, this returns ONLY the container document's metadata. Metadata from embedded documents (including their content, titles, authors, and any parse exceptions) is NOT included. To access metadata from all documents, use
getMetadataList().CONCATENATE mode: In CONCATENATE mode, there is only one metadata object containing the container's metadata and concatenated content from all documents. By the nature of CONCATENATE mode, you are losing metadata from embedded files, and Tika is silently swallowing exceptions in embedded files.
Recommendation: For RMETA mode (the default), use
getMetadataList()to access metadata from all documents, including embedded document exceptions (stored inTikaCoreProperties.EMBEDDED_EXCEPTION).- Returns:
- the container document's metadata, or null if not available
- See Also:
-
getMessage
Get any error message associated with the result.- Returns:
- the error message, or null if none
-
getPipesResult
Get the underlying PipesResult for advanced access.- Returns:
- the pipes result
-
toString
-