org.apache.tika.utils
Class ParseUtils

java.lang.Object
  extended by org.apache.tika.utils.ParseUtils
All Implemented Interfaces:
TikaMimeKeys

public class ParseUtils
extends java.lang.Object
implements TikaMimeKeys

Contains utility methods for parsing documents. Intended to provide simple entry points into the Tika framework.


Field Summary
 
Fields inherited from interface org.apache.tika.metadata.TikaMimeKeys
MIME_TYPE_MAGIC, TIKA_MIME_FILE
 
Constructor Summary
ParseUtils()
           
 
Method Summary
static Parser getParser(java.io.File documentFile, TikaConfig config)
          Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL.
static Parser getParser(java.lang.String mimeType, TikaConfig config)
          Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL.
static Parser getParser(java.net.URL documentUrl, TikaConfig config)
          Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL.
static java.lang.String getStringContent(java.io.File documentFile, TikaConfig config)
          Gets the string content of a document read from an input stream.
static java.lang.String getStringContent(java.io.File documentFile, TikaConfig config, java.lang.String mimeType)
          Gets the string content of a document read from an input stream.
static java.lang.String getStringContent(java.io.InputStream stream, TikaConfig config, java.lang.String mimeType)
          Gets the string content of a document read from an input stream.
static java.lang.String getStringContent(java.net.URL documentUrl, TikaConfig config)
          Gets the string content of a document read from an input stream.
static java.lang.String getStringContent(java.net.URL documentUrl, TikaConfig config, java.lang.String mimeType)
          Gets the string content of a document read from an input stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ParseUtils

public ParseUtils()
Method Detail

getParser

public static Parser getParser(java.lang.String mimeType,
                               TikaConfig config)
                        throws TikaException
Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL. NB: Close the input stream when it is no longer needed!

Parameters:
config -
mimeType - the document's MIME type
Returns:
a parser appropriate to this MIME type
Throws:
TikaException

getParser

public static Parser getParser(java.net.URL documentUrl,
                               TikaConfig config)
                        throws TikaException
Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL. The MIME type is determined automatically. NB: Close the input stream when it is no longer needed!

Parameters:
documentUrl - URL pointing to the document to parse
config -
Returns:
a parser appropriate to this MIME type and ready to read input from the specified document
Throws:
TikaException

getParser

public static Parser getParser(java.io.File documentFile,
                               TikaConfig config)
                        throws TikaException
Returns a parser that can handle the specified MIME type, and is set to receive input from a stream opened from the specified URL. NB: Close the input stream when it is no longer needed!

Parameters:
documentFile - File object pointing to the document to parse
config -
Returns:
a parser appropriate to this MIME type and ready to read input from the specified document
Throws:
TikaException

getStringContent

public static java.lang.String getStringContent(java.io.InputStream stream,
                                                TikaConfig config,
                                                java.lang.String mimeType)
                                         throws TikaException,
                                                java.io.IOException
Gets the string content of a document read from an input stream.

Parameters:
stream - the stream from which to read document data
config -
mimeType - MIME type of the data
Returns:
the string content parsed from the document
Throws:
TikaException
java.io.IOException

getStringContent

public static java.lang.String getStringContent(java.net.URL documentUrl,
                                                TikaConfig config)
                                         throws TikaException,
                                                java.io.IOException
Gets the string content of a document read from an input stream.

Parameters:
documentUrl - URL pointing to the document to parse
config -
Returns:
the string content parsed from the document
Throws:
TikaException
java.io.IOException

getStringContent

public static java.lang.String getStringContent(java.net.URL documentUrl,
                                                TikaConfig config,
                                                java.lang.String mimeType)
                                         throws TikaException,
                                                java.io.IOException
Gets the string content of a document read from an input stream.

Parameters:
documentUrl - URL pointing to the document to parse
config -
mimeType - MIME type of the data
Returns:
the string content parsed from the document
Throws:
TikaException
java.io.IOException

getStringContent

public static java.lang.String getStringContent(java.io.File documentFile,
                                                TikaConfig config,
                                                java.lang.String mimeType)
                                         throws TikaException,
                                                java.io.IOException
Gets the string content of a document read from an input stream.

Parameters:
documentFile - File object pointing to the document to parse
config -
mimeType - MIME type of the data
Returns:
the string content parsed from the document
Throws:
TikaException
java.io.IOException

getStringContent

public static java.lang.String getStringContent(java.io.File documentFile,
                                                TikaConfig config)
                                         throws TikaException,
                                                java.io.IOException
Gets the string content of a document read from an input stream.

Parameters:
documentFile - File object pointing to the document to parse
config -
Returns:
the string content parsed from the document
Throws:
TikaException
java.io.IOException


Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.