Constructor and Description |
---|
Tika()
Creates a Tika facade using the default configuration.
|
Tika(Detector detector)
Creates a Tika facade using the given detector instance, the
default parser configuration, and the default Translator.
|
Tika(Detector detector,
Parser parser)
Creates a Tika facade using the given detector and parser instances, but the default
Translator.
|
Tika(Detector detector,
Parser parser,
Translator translator)
Creates a Tika facade using the given detector, parser, and translator instances.
|
Tika(TikaConfig config)
Creates a Tika facade using the given configuration.
|
Modifier and Type | Method and Description |
---|---|
String |
detect(byte[] prefix)
Detects the media type of the given document.
|
String |
detect(byte[] prefix,
String name)
Detects the media type of the given document.
|
String |
detect(File file)
Detects the media type of the given file.
|
String |
detect(InputStream stream)
Detects the media type of the given document.
|
String |
detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
String |
detect(InputStream stream,
String name)
Detects the media type of the given document.
|
String |
detect(Path path)
Detects the media type of the file at the given path.
|
String |
detect(String name)
Detects the media type of a document with the given file name.
|
String |
detect(URL url)
Detects the media type of the resource at the given URL.
|
Detector |
getDetector()
Returns the detector instance used by this facade.
|
int |
getMaxStringLength()
Returns the maximum length of strings returned by the
parseToString methods.
|
Parser |
getParser()
Returns the parser instance used by this facade.
|
Translator |
getTranslator()
Returns the translator instance used by this facade.
|
Reader |
parse(File file)
Parses the given file and returns the extracted text content.
|
Reader |
parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
parse(InputStream stream)
Parses the given document and returns the extracted text content.
|
Reader |
parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
parse(Path path)
Parses the file at the given path and returns the extracted text content.
|
Reader |
parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
Reader |
parse(URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
String |
parseToString(File file)
Parses the given file and returns the extracted text content.
|
String |
parseToString(InputStream stream)
Parses the given document and returns the extracted text content.
|
String |
parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
String |
parseToString(Path path)
Parses the file at the given path and returns the extracted text content.
|
String |
parseToString(URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
void |
setMaxStringLength(int maxStringLength)
Sets the maximum length of strings returned by the parseToString
methods.
|
String |
toString() |
String |
translate(String text,
String targetLanguage)
Translate the given text String to the given language, attempting to auto-detect the
source language.
|
String |
translate(String text,
String sourceLanguage,
String targetLanguage)
Translate the given text String to and from the given languages.
|
public Tika(Detector detector, Parser parser)
detector
- type detectorparser
- document parserpublic Tika(Detector detector, Parser parser, Translator translator)
detector
- type detectorparser
- document parsertranslator
- text translatorpublic Tika(TikaConfig config)
config
- Tika configurationpublic Tika()
public Tika(Detector detector)
detector
- type detectorpublic String detect(InputStream stream, Metadata metadata) throws IOException
null
,
in which case only the given document metadata is used for type
detection.
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
Unlike in the parse(InputStream, Metadata)
method, the
given document metadata is not modified by this method.
stream
- the document stream, or null
metadata
- document metadataIOException
- if the stream can not be readpublic String detect(InputStream stream, String name) throws IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamname
- document nameIOException
- if the stream can not be readpublic String detect(InputStream stream) throws IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamIOException
- if the stream can not be readpublic String detect(byte[] prefix, String name)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentname
- document namepublic String detect(byte[] prefix)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentpublic String detect(Path path) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
path
- the path of the fileIOException
- if the file can not be readpublic String detect(File file) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
file
- the fileIOException
- if the file can not be readdetect(Path)
public String detect(URL url) throws IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the URL.
url
- the URL of the resourceIOException
- if the resource can not be readpublic String detect(String name)
The given name can also be a URL or a full file path. In such cases only the file name part of the string is used for type detection.
name
- the file name of the documentpublic String translate(String text, String sourceLanguage, String targetLanguage)
text
- The text to translate.sourceLanguage
- The input text language (for example, "hi").targetLanguage
- The desired output language (for example, "fr").Translator
public String translate(String text, String targetLanguage)
text
- The text to translate.targetLanguage
- The desired output language (for example, "en").Translator
public Reader parse(InputStream stream, Metadata metadata) throws IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedmetadata
- where document's metadata will be populatedIOException
- if the document can not be read or parsedpublic Reader parse(InputStream stream) throws IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedIOException
- if the document can not be read or parsedpublic Reader parse(Path path, Metadata metadata) throws IOException
Metadata information extracted from the document is returned in the supplied metadata instance.
path
- the path of the file to be parsedmetadata
- where document's metadata will be populatedIOException
- if the file can not be read or parsedpublic Reader parse(Path path) throws IOException
path
- the path of the file to be parsedIOException
- if the file can not be read or parsedpublic Reader parse(File file, Metadata metadata) throws IOException
Metadata information extracted from the document is returned in the supplied metadata instance.
file
- the file to be parsedmetadata
- where document's metadata will be populatedIOException
- if the file can not be read or parsedparse(Path)
public Reader parse(File file) throws IOException
file
- the file to be parsedIOException
- if the file can not be read or parsedparse(Path)
public Reader parse(URL url) throws IOException
url
- the URL of the resource to be parsedIOException
- if the resource can not be read or parsedpublic String parseToString(InputStream stream, Metadata metadata) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadataIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(InputStream stream, Metadata metadata, int maxLength) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains only up to maxLength (parameter) first characters extracted from the input document.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadatamaxLength
- maximum length of the returned stringIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(InputStream stream) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedIOException
- if the document can not be readTikaException
- if the document can not be parsedpublic String parseToString(Path path) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
path
- the path of the file to be parsedIOException
- if the file can not be readTikaException
- if the file can not be parsedpublic String parseToString(File file) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
file
- the file to be parsedIOException
- if the file can not be readTikaException
- if the file can not be parsedparseToString(Path)
public String parseToString(URL url) throws IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
url
- the URL of the resource to be parsedIOException
- if the resource can not be readTikaException
- if the resource can not be parsedpublic int getMaxStringLength()
public void setMaxStringLength(int maxStringLength)
maxStringLength
- maximum string length,
or -1 to disable this limitpublic Parser getParser()
public Detector getDetector()
public Translator getTranslator()
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.