Class DefaultDetector

java.lang.Object
org.apache.tika.detect.CompositeDetector
org.apache.tika.detect.DefaultDetector
All Implemented Interfaces:
Serializable, SelfConfiguring, Detector

public class DefaultDetector extends CompositeDetector
A composite detector that orchestrates the detection pipeline:
  1. MimeTypes (magic byte) detection
  2. Container and other detectors loaded via SPI
  3. TextDetector as fallback for unknown types
  4. Returns the most specific type detected

Detectors are loaded and returned in a specified order, of user supplied followed by non-MimeType Tika detectors. If you need to control the order of the Detectors, you should instead construct your own CompositeDetector and pass in the list of Detectors in the required order.

Individual detectors that need random access (e.g., for container inspection) handle their own spooling by calling TikaInputStream.getFile().

Since:
Apache Tika 0.9
See Also:
  • Constructor Details

  • Method Details

    • detect

      public MediaType detect(TikaInputStream tis, Metadata metadata, ParseContext parseContext) throws IOException
      Description copied from interface: Detector
      Detects the content type of the given input document. Returns application/octet-stream if the type of the document can not be detected.

      If the document input stream is not available, then the first argument may be null. Otherwise the detector may read bytes from the start of the stream to help in type detection. The detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning. The stream must not be closed by the detector.

      The given input metadata is only read, not modified, by the detector.

      Specified by:
      detect in interface Detector
      Overrides:
      detect in class CompositeDetector
      Parameters:
      tis - document input stream, or null
      metadata - input metadata for the document
      parseContext - the parse context
      Returns:
      detected media type, or application/octet-stream
      Throws:
      IOException - if the document input stream could not be read
    • getDetectors

      public List<Detector> getDetectors()
      Description copied from class: CompositeDetector
      Returns the component detectors.
      Overrides:
      getDetectors in class CompositeDetector
    • getExcludedClasses

      public Collection<Class<? extends Detector>> getExcludedClasses()
      Returns the classes that were explicitly excluded when constructing this detector. Used for round-trip serialization to preserve exclusion configuration.
      Returns:
      unmodifiable collection of excluded detector classes, never null