org.apache.tika.mime
Class MimeTypes

java.lang.Object
  extended by org.apache.tika.mime.MimeTypes
All Implemented Interfaces:
Serializable, Detector

public final class MimeTypes
extends Object
implements Detector, Serializable

This class is a MimeType repository. It gathers a set of MimeTypes and enables to retrieves a content-type from its name, from a file name, or from a magic character sequence.

The MIME type detection methods that take an InputStream as an argument will never reads more than getMinLength() bytes from the stream. Also the given stream is never closed, marked, or reset by the methods. Thus a client can use the mark feature of the stream (if available) to restore the stream back to the state it was before type detection if it wants to process the stream based on the detected type.

See Also:
Serialized Form

Field Summary
static String OCTET_STREAM
          Name of the root type, application/octet-stream.
static String PLAIN_TEXT
          Name of the text type, text/plain.
static String XML
          Name of the xml type, application/xml.
 
Constructor Summary
MimeTypes()
           
 
Method Summary
 void addPattern(MimeType type, String pattern)
          Adds a file name pattern for the given media type.
 void addPattern(MimeType type, String pattern, boolean isRegex)
          Adds a file name pattern for the given media type.
 MediaType detect(InputStream input, Metadata metadata)
          Automatically detects the MIME type of a document based on magic markers in the stream prefix and any given metadata hints.
 MimeType forName(String name)
          Returns the registered media type with the given name (or alias).
static MimeTypes getDefaultMimeTypes()
          Get the default MimeTypes.
 MediaTypeRegistry getMediaTypeRegistry()
           
 MimeType getMimeType(File file)
          Deprecated. Use Tika.detect(File) instead
 MimeType getMimeType(String name)
          Deprecated. Use Tika.detect(String) instead
 int getMinLength()
          Return the minimum length of data to provide to analyzing methods based on the document's content in order to check all the known MimeTypes.
 void setSuperType(MimeType type, MediaType parent)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OCTET_STREAM

public static final String OCTET_STREAM
Name of the root type, application/octet-stream.

See Also:
Constant Field Values

PLAIN_TEXT

public static final String PLAIN_TEXT
Name of the text type, text/plain.

See Also:
Constant Field Values

XML

public static final String XML
Name of the xml type, application/xml.

See Also:
Constant Field Values
Constructor Detail

MimeTypes

public MimeTypes()
Method Detail

getMimeType

public MimeType getMimeType(String name)
Deprecated. Use Tika.detect(String) instead

Find the Mime Content Type of a document from its name. Returns application/octet-stream if no better match is found.

Parameters:
name - of the document to analyze.
Returns:
the Mime Content Type of the specified document name

getMimeType

public MimeType getMimeType(File file)
                     throws MimeTypeException,
                            IOException
Deprecated. Use Tika.detect(File) instead

Find the Mime Content Type of a document stored in the given file. Returns application/octet-stream if no better match is found.

Parameters:
file - file to analyze
Returns:
the Mime Content Type of the specified document
Throws:
MimeTypeException - if the type can't be detected
IOException - if the file can't be read

forName

public MimeType forName(String name)
                 throws MimeTypeException
Returns the registered media type with the given name (or alias). The named media type is automatically registered (and returned) if it doesn't already exist.

Parameters:
name - media type name (case-insensitive)
Returns:
the registered media type with the given name or alias
Throws:
MimeTypeException - if the given media type name is invalid

setSuperType

public void setSuperType(MimeType type,
                         MediaType parent)

addPattern

public void addPattern(MimeType type,
                       String pattern)
                throws MimeTypeException
Adds a file name pattern for the given media type. Assumes that the pattern being added is not a JDK standard regular expression.

Parameters:
type - media type
pattern - file name pattern
Throws:
MimeTypeException - if the pattern conflicts with existing ones

addPattern

public void addPattern(MimeType type,
                       String pattern,
                       boolean isRegex)
                throws MimeTypeException
Adds a file name pattern for the given media type. The caller can specify whether the pattern being added is or is not a JDK standard regular expression via the isRegex parameter. If the value is set to true, then a JDK standard regex is assumed, otherwise the freedesktop glob type is assumed.

Parameters:
type - media type
pattern - file name pattern
isRegex - set to true if JDK std regexs are desired, otherwise set to false.
Throws:
MimeTypeException - if the pattern conflicts with existing ones.

getMediaTypeRegistry

public MediaTypeRegistry getMediaTypeRegistry()

getMinLength

public int getMinLength()
Return the minimum length of data to provide to analyzing methods based on the document's content in order to check all the known MimeTypes.

Returns:
the minimum length of data to provide.
See Also:
getMimeType(byte[]), #getMimeType(String, byte[])

detect

public MediaType detect(InputStream input,
                        Metadata metadata)
                 throws IOException
Automatically detects the MIME type of a document based on magic markers in the stream prefix and any given metadata hints.

The given stream is expected to support marks, so that this method can reset the stream to the position it was in before this method was called.

Specified by:
detect in interface Detector
Parameters:
input - document stream, or null
metadata - metadata hints
Returns:
MIME type of the document
Throws:
IOException - if the document stream could not be read

getDefaultMimeTypes

public static MimeTypes getDefaultMimeTypes()
Get the default MimeTypes. This includes all the build in media types, and any custom override ones present.

Returns:
MimeTypes default type registry


Copyright © 2007-2012 The Apache Software Foundation. All Rights Reserved.