Class DefaultDetector
- All Implemented Interfaces:
Serializable,SelfConfiguring,Detector
- MimeTypes (magic byte) detection
- Container and other detectors loaded via SPI
- TextDetector as fallback for unknown types
- Returns the most specific type detected
Detectors are loaded and returned in a specified order, of user supplied
followed by non-MimeType Tika detectors.
If you need to control the order of the Detectors, you should instead
construct your own CompositeDetector and pass in the list
of Detectors in the required order.
Individual detectors that need random access (e.g., for container inspection)
handle their own spooling by calling TikaInputStream.getFile().
- Since:
- Apache Tika 0.9
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDefaultDetector(ClassLoader loader) DefaultDetector(MimeTypes types) DefaultDetector(MimeTypes types, ClassLoader loader) DefaultDetector(MimeTypes types, ServiceLoader loader) DefaultDetector(MimeTypes types, ServiceLoader loader, Collection<Class<? extends Detector>> excludeDetectors) -
Method Summary
Modifier and TypeMethodDescriptiondetect(TikaInputStream tis, Metadata metadata, ParseContext parseContext) Detects the content type of the given input document.Returns the component detectors.Collection<Class<? extends Detector>>Returns the classes that were explicitly excluded when constructing this detector.
-
Constructor Details
-
DefaultDetector
public DefaultDetector(MimeTypes types, ServiceLoader loader, Collection<Class<? extends Detector>> excludeDetectors) -
DefaultDetector
-
DefaultDetector
-
DefaultDetector
-
DefaultDetector
-
DefaultDetector
public DefaultDetector()
-
-
Method Details
-
detect
public MediaType detect(TikaInputStream tis, Metadata metadata, ParseContext parseContext) throws IOException Description copied from interface:DetectorDetects the content type of the given input document. Returnsapplication/octet-streamif the type of the document can not be detected.If the document input stream is not available, then the first argument may be
null. Otherwise the detector may read bytes from the start of the stream to help in type detection. The detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning. The stream must not be closed by the detector.The given input metadata is only read, not modified, by the detector.
- Specified by:
detectin interfaceDetector- Overrides:
detectin classCompositeDetector- Parameters:
tis- document input stream, ornullmetadata- input metadata for the documentparseContext- the parse context- Returns:
- detected media type, or
application/octet-stream - Throws:
IOException- if the document input stream could not be read
-
getDetectors
Description copied from class:CompositeDetectorReturns the component detectors.- Overrides:
getDetectorsin classCompositeDetector
-
getExcludedClasses
Returns the classes that were explicitly excluded when constructing this detector. Used for round-trip serialization to preserve exclusion configuration.- Returns:
- unmodifiable collection of excluded detector classes, never null
-