Package org.apache.tika.detect
@Version("1.0.0")
package org.apache.tika.detect
Media type detection.
-
ClassDescriptionAn input stream reader that automatically detects the character encoding to be used for converting bytes to characters.Encoding detector that identifies the character set from a byte-order mark (BOM) at the start of the stream.Maps detected charsets to safer superset charsets for decoding.Content type detector that combines multiple different detection mechanisms.A composite encoding detector that runs child detectors.A composite detector that orchestrates the detection pipeline: MimeTypes (magic byte) detection Container and other detectors loaded via SPI TextDetector as fallback for unknown types Returns the most specific type detectedA composite encoding detector based on all the
EncodingDetectorimplementations available through theservice provider mechanism.A version ofDefaultDetectorfor probabilistic mime detectors, which use statistical techniques to blend the results of differing underlying detectors when attempting to detect the type of a given file.Utility methods for content detection.Content type detector.Dummy detector that returns application/octet-stream for all documents.Character encoding detector.Context object that collects encoding detection results from base detectors.A single detector's contribution: its ranked list of candidates and its name.A charset detection result pairing aCharsetwith a confidence score and aEncodingResult.ResultTypeindicating the nature of the evidence.The nature of the evidence that produced this result.This runs the linux 'file' command against a file.Content type detection based on magic bytes, i.e. type-specific patterns near the beginning of the document input stream.Detector for Matroska (MKV and WEBM) files based on the EBML header.Encoding detector that extracts a declared charset from Tika metadata without reading any bytes from the stream.Marker interface for encoding detectors that arbitrate among candidates collected by base detectors rather than detecting encoding directly from the stream.Content type detection based on the resource name.Deprecated.after 2.5.0 this functionality was moved to the CompositeDetectorAlways returns the charset passed in via the initializerConfiguration class for JSON deserialization.Content type detection of plain text documents.Utility class for computing a histogram of the bytes seen in a stream.Content type detection based on a content type hint.Utility class that uses aSAXParserto determine the namespace URI and local name of the root element of an XML file.Detector to identify zero length files as application/x-zerovalue