|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.tika.Tika
public class Tika
Facade class for accessing Tika functionality. This class hides much of the underlying complexity of the lower level Tika classes and provides simple methods for many common parsing and type detection operations.
Parser
,
Detector
Constructor Summary | |
---|---|
Tika()
Creates a Tika facade using the default configuration. |
|
Tika(Detector detector)
Creates a Tika facade using the given detector instance and the default parser configuration. |
|
Tika(Detector detector,
Parser parser)
Creates a Tika facade using the given detector and parser instances. |
|
Tika(TikaConfig config)
Creates a Tika facade using the given configuration. |
Method Summary | |
---|---|
java.lang.String |
detect(byte[] prefix)
Detects the media type of the given document. |
java.lang.String |
detect(byte[] prefix,
java.lang.String name)
Detects the media type of the given document. |
java.lang.String |
detect(java.io.File file)
Detects the media type of the given file. |
java.lang.String |
detect(java.io.InputStream stream)
Detects the media type of the given document. |
java.lang.String |
detect(java.io.InputStream stream,
Metadata metadata)
Detects the media type of the given document. |
java.lang.String |
detect(java.io.InputStream stream,
java.lang.String name)
Detects the media type of the given document. |
java.lang.String |
detect(java.lang.String name)
Detects the media type of a document with the given file name. |
java.lang.String |
detect(java.net.URL url)
Detects the media type of the resource at the given URL. |
int |
getMaxStringLength()
Returns the maximum length of strings returned by the parseToString methods. |
java.io.Reader |
parse(java.io.File file)
Parses the given file and returns the extracted text content. |
java.io.Reader |
parse(java.io.InputStream stream)
Parses the given document and returns the extracted text content. |
java.io.Reader |
parse(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content. |
java.io.Reader |
parse(java.net.URL url)
Parses the resource at the given URL and returns the extracted text content. |
java.lang.String |
parseToString(java.io.File file)
Parses the given file and returns the extracted text content. |
java.lang.String |
parseToString(java.io.InputStream stream)
Parses the given document and returns the extracted text content. |
java.lang.String |
parseToString(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content. |
java.lang.String |
parseToString(java.net.URL url)
Parses the resource at the given URL and returns the extracted text content. |
void |
setMaxStringLength(int maxStringLength)
Sets the maximum length of strings returned by the parseToString methods. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Tika(Detector detector, Parser parser)
detector
- type detectorparser
- document parserpublic Tika(TikaConfig config)
config
- Tika configurationpublic Tika()
public Tika(Detector detector)
detector
- type detectorMethod Detail |
---|
public java.lang.String detect(java.io.InputStream stream, Metadata metadata) throws java.io.IOException
null
,
in which case only the given document metadata is used for type
detection.
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
Unlike in the parse(InputStream, Metadata)
method, the
given document metadata is not modified by this method.
stream
- the document stream, or null
metadata
- document metadata
java.io.IOException
- if the stream can not be readpublic java.lang.String detect(java.io.InputStream stream, java.lang.String name) throws java.io.IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamname
- document name
java.io.IOException
- if the stream can not be readpublic java.lang.String detect(java.io.InputStream stream) throws java.io.IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document stream
java.io.IOException
- if the stream can not be readpublic java.lang.String detect(byte[] prefix, java.lang.String name)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentname
- document name
public java.lang.String detect(byte[] prefix)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the document
public java.lang.String detect(java.io.File file) throws java.io.IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
file
- the file
java.io.IOException
- if the file can not be readpublic java.lang.String detect(java.net.URL url) throws java.io.IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the URL.
url
- the URL of the resource
java.io.IOException
- if the resource can not be readpublic java.lang.String detect(java.lang.String name)
The given name can also be a URL or a full file path. In such cases only the file name part of the string is used for type detection.
name
- the file name of the document
public java.io.Reader parse(java.io.InputStream stream, Metadata metadata) throws java.io.IOException
stream
- the document to be parsed
java.io.IOException
- if the document can not be read or parsedpublic java.io.Reader parse(java.io.InputStream stream) throws java.io.IOException
stream
- the document to be parsed
java.io.IOException
- if the document can not be read or parsedpublic java.io.Reader parse(java.io.File file) throws java.io.IOException
file
- the file to be parsed
java.io.IOException
- if the file can not be read or parsedpublic java.io.Reader parse(java.net.URL url) throws java.io.IOException
url
- the URL of the resource to be parsed
java.io.IOException
- if the resource can not be read or parsedpublic java.lang.String parseToString(java.io.InputStream stream, Metadata metadata) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
stream
- the document to be parsedmetadata
- document metadata
java.io.IOException
- if the document can not be read
TikaException
- if the document can not be parsedpublic java.lang.String parseToString(java.io.InputStream stream) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
stream
- the document to be parsed
java.io.IOException
- if the document can not be read
TikaException
- if the document can not be parsedpublic java.lang.String parseToString(java.io.File file) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
file
- the file to be parsed
java.io.IOException
- if the file can not be read
TikaException
- if the file can not be parsedpublic java.lang.String parseToString(java.net.URL url) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
url
- the URL of the resource to be parsed
java.io.IOException
- if the resource can not be read
TikaException
- if the resource can not be parsedpublic int getMaxStringLength()
public void setMaxStringLength(int maxStringLength)
maxStringLength
- maximum string length,
or -1 to disable this limit
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |