Class ParseRecord

java.lang.Object
org.apache.tika.parser.ParseRecord

public class ParseRecord extends Object
Use this class to store exceptions, warnings and other information during the parse. This information is added to the parent's metadata after the parse by the CompositeParser.

This class also tracks embedded document processing limits (depth and count) which can be configured via setMaxEmbeddedDepth(int) and setMaxEmbeddedCount(int).

  • Constructor Details

    • ParseRecord

      public ParseRecord()
  • Method Details

    • newInstance

      public static ParseRecord newInstance(ParseContext context)
      Creates a new ParseRecord configured from EmbeddedLimits in the ParseContext.

      If EmbeddedLimits is present in the context, the ParseRecord will be configured with those limits. Otherwise, default unlimited values are used.

      Parameters:
      context - the ParseContext (may be null)
      Returns:
      a new ParseRecord configured from the context
    • getDepth

      public int getDepth()
    • getParsers

      public String[] getParsers()
    • addException

      public void addException(Exception e)
    • addWarning

      public void addWarning(String msg)
    • addMetadata

      public void addMetadata(Metadata metadata)
    • setWriteLimitReached

      public void setWriteLimitReached(boolean writeLimitReached)
    • getExceptions

      public List<Exception> getExceptions()
    • getWarnings

      public List<String> getWarnings()
    • isWriteLimitReached

      public boolean isWriteLimitReached()
    • getMetadataList

      public List<Metadata> getMetadataList()
    • isThrowOnMaxDepth

      public boolean isThrowOnMaxDepth()
      Returns whether throwing is configured when max depth is reached.
      Returns:
      true if an exception should be thrown on max depth
    • isThrowOnMaxCount

      public boolean isThrowOnMaxCount()
      Returns whether throwing is configured when max count is reached.
      Returns:
      true if an exception should be thrown on max count
    • setEmbeddedDepthLimitReached

      public void setEmbeddedDepthLimitReached(boolean embeddedDepthLimitReached)
      Sets the flag indicating the embedded depth limit was reached.
      Parameters:
      embeddedDepthLimitReached - true if depth limit was reached
    • setEmbeddedCountLimitReached

      public void setEmbeddedCountLimitReached(boolean embeddedCountLimitReached)
      Sets the flag indicating the embedded count limit was reached.
      Parameters:
      embeddedCountLimitReached - true if count limit was reached
    • incrementEmbeddedCount

      public void incrementEmbeddedCount()
      Increments the embedded document count. Should be called when an embedded document is about to be parsed.
    • getEmbeddedCount

      public int getEmbeddedCount()
      Gets the current count of embedded documents processed.
      Returns:
      the embedded document count
    • setMaxEmbeddedDepth

      public void setMaxEmbeddedDepth(int maxEmbeddedDepth)
      Sets the maximum depth for parsing embedded documents. A value of -1 means unlimited (the default). A value of 0 means no embedded documents will be parsed. A value of 1 means only first-level embedded documents will be parsed, etc.
      Parameters:
      maxEmbeddedDepth - the maximum embedded depth, or -1 for unlimited
    • getMaxEmbeddedDepth

      public int getMaxEmbeddedDepth()
      Gets the maximum depth for parsing embedded documents.
      Returns:
      the maximum embedded depth, or -1 if unlimited
    • setMaxEmbeddedCount

      public void setMaxEmbeddedCount(int maxEmbeddedCount)
      Sets the maximum number of embedded documents to parse. A value of -1 means unlimited (the default).
      Parameters:
      maxEmbeddedCount - the maximum embedded count, or -1 for unlimited
    • getMaxEmbeddedCount

      public int getMaxEmbeddedCount()
      Gets the maximum number of embedded documents to parse.
      Returns:
      the maximum embedded count, or -1 if unlimited
    • isEmbeddedDepthLimitReached

      public boolean isEmbeddedDepthLimitReached()
      Returns whether the embedded depth limit was reached during parsing.
      Returns:
      true if the depth limit was reached
    • isEmbeddedCountLimitReached

      public boolean isEmbeddedCountLimitReached()
      Returns whether the embedded count limit was reached during parsing.
      Returns:
      true if the count limit was reached