Package org.apache.tika.detect.siegfried
Class SiegfriedDetector
java.lang.Object
org.apache.tika.detect.siegfried.SiegfriedDetector
- All Implemented Interfaces:
Serializable
,org.apache.tika.detect.Detector
Simple wrapper around Siegfried https://github.com/richardlehane/siegfried
The default behavior is to run detection, report the results in the
metadata and then return null so that other detectors will be used.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic String
static String
static String
static String
static String
static org.apache.tika.metadata.Property
static org.apache.tika.metadata.Property
static org.apache.tika.metadata.Property
static final String
static org.apache.tika.metadata.Property
static org.apache.tika.metadata.Property
static org.apache.tika.metadata.Property
static String
static String
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
checkHasSiegfried
(String siegfriedCommandPath) org.apache.tika.mime.MediaType
detect
(InputStream input, org.apache.tika.metadata.Metadata metadata) boolean
protected static org.apache.tika.mime.MediaType
processResult
(org.apache.tika.utils.FileProcessResult result, org.apache.tika.metadata.Metadata metadata, boolean returnMime) void
setMaxBytes
(int maxBytes) If this is not called on a TikaInputStream, this detector will spool up to this many bytes to a file to be detected by the 'file' command.void
setSiegfriedPath
(String fileCommandPath) void
setTimeoutMs
(long timeoutMs) void
setUseMime
(boolean useMime) As default behavior, Tika runs Siegfried to add its detection to the metadata, but NOT to use detection in determining parsers etc.
-
Field Details
-
SIEGFRIED_PREFIX
- See Also:
-
SIEGFRIED_STATUS
public static org.apache.tika.metadata.Property SIEGFRIED_STATUS -
SIEGFRIED_VERSION
public static org.apache.tika.metadata.Property SIEGFRIED_VERSION -
SIEGFRIED_SIGNATURE
public static org.apache.tika.metadata.Property SIEGFRIED_SIGNATURE -
SIEGFRIED_IDENTIFIERS_NAME
public static org.apache.tika.metadata.Property SIEGFRIED_IDENTIFIERS_NAME -
SIEGFRIED_IDENTIFIERS_DETAILS
public static org.apache.tika.metadata.Property SIEGFRIED_IDENTIFIERS_DETAILS -
SIEGFRIED_ERRORS
public static org.apache.tika.metadata.Property SIEGFRIED_ERRORS -
ID
-
FORMAT
-
VERSION
-
MIME
-
WARNING
-
BASIS
-
ERRORS
-
-
Constructor Details
-
SiegfriedDetector
public SiegfriedDetector()
-
-
Method Details
-
checkHasSiegfried
-
detect
public org.apache.tika.mime.MediaType detect(InputStream input, org.apache.tika.metadata.Metadata metadata) throws IOException - Specified by:
detect
in interfaceorg.apache.tika.detect.Detector
- Parameters:
input
- document input stream, ornull
metadata
- input metadata for the document- Returns:
- mime as identified by the file command or application/octet-stream otherwise
- Throws:
IOException
-
setUseMime
@Field public void setUseMime(boolean useMime) As default behavior, Tika runs Siegfried to add its detection to the metadata, but NOT to use detection in determining parsers etc. If this is set totrue
, this detector will return the first mime detected by Siegfried and that mime will be used by the AutoDetectParser to select the appropriate parser.- Parameters:
useMime
-
-
isUseMime
public boolean isUseMime() -
processResult
protected static org.apache.tika.mime.MediaType processResult(org.apache.tika.utils.FileProcessResult result, org.apache.tika.metadata.Metadata metadata, boolean returnMime) -
setSiegfriedPath
-
setMaxBytes
@Field public void setMaxBytes(int maxBytes) If this is not called on a TikaInputStream, this detector will spool up to this many bytes to a file to be detected by the 'file' command.- Parameters:
maxBytes
-
-
setTimeoutMs
@Field public void setTimeoutMs(long timeoutMs)
-