Class DetectHelper

java.lang.Object
org.apache.tika.detect.DetectHelper

public class DetectHelper extends Object
Utility methods for content detection.
  • Constructor Details

    • DetectHelper

      public DetectHelper()
  • Method Details

    • getStreamForDetectionOnly

      public static TikaInputStream getStreamForDetectionOnly(InputStream stream, int maxLength, Metadata metadata) throws IOException
      Creates a TikaInputStream suitable for detection-only purposes by reading up to maxLength bytes from the input stream into a byte array.

      If the input stream contains more bytes than maxLength, the resulting metadata will have TikaCoreProperties.TRUNCATED_CONTENT_FOR_DETECTION set to true, signaling to detectors that they are working with truncated content and should adjust their behavior accordingly.

      This is useful when you want to perform detection on a limited portion of a large file without spooling the entire file to disk.

      NOTEThe downside is that you may lose precision in detection! This should only be used if you are performing detection only with no parsing.

      Parameters:
      stream - the input stream to read from (will NOT be closed)
      maxLength - the maximum number of bytes to read
      metadata - the metadata object where truncation flag will be set if applicable
      Returns:
      a TikaInputStream backed by the buffered bytes
      Throws:
      IOException - if an I/O error occurs
    • getStreamForDetectionOnly

      public static TikaInputStream getStreamForDetectionOnly(InputStream stream, int maxLength) throws IOException
      Creates a TikaInputStream suitable for detection-only purposes by reading up to maxLength bytes from the input stream into a byte array.

      This overload creates a new Metadata object internally. If you need to check whether the content was truncated, use getStreamForDetectionOnly(InputStream, int, Metadata) instead.

      Parameters:
      stream - the input stream to read from (will NOT be closed)
      maxLength - the maximum number of bytes to read
      Returns:
      a TikaInputStream backed by the buffered bytes
      Throws:
      IOException - if an I/O error occurs
    • isContentTruncatedForDetection

      public static boolean isContentTruncatedForDetection(Metadata metadata)
      Checks if the given metadata indicates that the content was truncated for detection.
      Parameters:
      metadata - the metadata to check
      Returns:
      true if the content was truncated, false otherwise
    • getDetectionContentLength

      public static int getDetectionContentLength(Metadata metadata)
      Gets the number of bytes buffered for detection.
      Parameters:
      metadata - the metadata to check
      Returns:
      the number of bytes buffered, or -1 if not set