Package org.apache.tika.detect
Class DetectHelper
java.lang.Object
org.apache.tika.detect.DetectHelper
Utility methods for content detection.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic intgetDetectionContentLength(Metadata metadata) Gets the number of bytes buffered for detection.static TikaInputStreamgetStreamForDetectionOnly(InputStream stream, int maxLength) Creates a TikaInputStream suitable for detection-only purposes by reading up tomaxLengthbytes from the input stream into a byte array.static TikaInputStreamgetStreamForDetectionOnly(InputStream stream, int maxLength, Metadata metadata) Creates a TikaInputStream suitable for detection-only purposes by reading up tomaxLengthbytes from the input stream into a byte array.static booleanisContentTruncatedForDetection(Metadata metadata) Checks if the given metadata indicates that the content was truncated for detection.
-
Constructor Details
-
DetectHelper
public DetectHelper()
-
-
Method Details
-
getStreamForDetectionOnly
public static TikaInputStream getStreamForDetectionOnly(InputStream stream, int maxLength, Metadata metadata) throws IOException Creates a TikaInputStream suitable for detection-only purposes by reading up tomaxLengthbytes from the input stream into a byte array.If the input stream contains more bytes than
maxLength, the resulting metadata will haveTikaCoreProperties.TRUNCATED_CONTENT_FOR_DETECTIONset totrue, signaling to detectors that they are working with truncated content and should adjust their behavior accordingly.This is useful when you want to perform detection on a limited portion of a large file without spooling the entire file to disk.
NOTEThe downside is that you may lose precision in detection! This should only be used if you are performing detection only with no parsing.
- Parameters:
stream- the input stream to read from (will NOT be closed)maxLength- the maximum number of bytes to readmetadata- the metadata object where truncation flag will be set if applicable- Returns:
- a TikaInputStream backed by the buffered bytes
- Throws:
IOException- if an I/O error occurs
-
getStreamForDetectionOnly
public static TikaInputStream getStreamForDetectionOnly(InputStream stream, int maxLength) throws IOException Creates a TikaInputStream suitable for detection-only purposes by reading up tomaxLengthbytes from the input stream into a byte array.This overload creates a new Metadata object internally. If you need to check whether the content was truncated, use
getStreamForDetectionOnly(InputStream, int, Metadata)instead.- Parameters:
stream- the input stream to read from (will NOT be closed)maxLength- the maximum number of bytes to read- Returns:
- a TikaInputStream backed by the buffered bytes
- Throws:
IOException- if an I/O error occurs
-
isContentTruncatedForDetection
Checks if the given metadata indicates that the content was truncated for detection.- Parameters:
metadata- the metadata to check- Returns:
- true if the content was truncated, false otherwise
-
getDetectionContentLength
Gets the number of bytes buffered for detection.- Parameters:
metadata- the metadata to check- Returns:
- the number of bytes buffered, or -1 if not set
-