public class TikaInputStream
extends org.apache.commons.io.input.TaggedInputStream
InputStream
instance passed through the
Parser
interface and other similar APIs.
TikaInputStream instances can be created using the various static
get()
factory methods. Most of these methods take an optional
Metadata
argument that is then filled with the available input
metadata from the given resource. The created TikaInputStream instance
keeps track of the original resource used to create it, while behaving
otherwise just like a normal, buffered InputStream
.
A TikaInputStream instance is also guaranteed to support the
mark(int)
feature.
Code that wants to access the underlying file or other resources
associated with a TikaInputStream should first use the
get(InputStream)
factory method to cast or wrap a given
InputStream
into a TikaInputStream instance.
TikaInputStream includes a few safety features to protect against parsers
that may fail to check for an EOF or may incorrectly rely on the unreliable
value returned from FileInputStream.skip(long)
. These parser failures
can lead to infinite loops. We strongly encourage the use of
TikaInputStream.
in
Modifier and Type | Method and Description |
---|---|
void |
addCloseableResource(Closeable closeable) |
protected void |
afterRead(int n) |
static TikaInputStream |
cast(InputStream stream)
Returns the given stream casts to a TikaInputStream, or
null if the stream is not a TikaInputStream. |
void |
close() |
static TikaInputStream |
get(Blob blob)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
get(byte[] data)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
get(File file)
Deprecated.
use
get(Path) . In Tika 2.0, this will be removed
or modified to throw an IOException. |
static TikaInputStream |
get(File file,
Metadata metadata)
Deprecated.
use
get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
get(InputStream stream)
Casts or wraps the given stream to a TikaInputStream instance.
|
static TikaInputStream |
get(InputStreamFactory factory)
Creates a TikaInputStream from a Factory which can create
fresh
InputStream s for the same resource multiple times. |
static TikaInputStream |
get(InputStreamFactory factory,
TemporaryResources tmp)
Creates a TikaInputStream from a Factory which can create
fresh
InputStream s for the same resource multiple times. |
static TikaInputStream |
get(InputStream stream,
TemporaryResources tmp)
Deprecated.
|
static TikaInputStream |
get(InputStream stream,
TemporaryResources tmp,
Metadata metadata)
Casts or wraps the given stream to a TikaInputStream instance.
|
static TikaInputStream |
get(Path path)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
get(Path path,
Metadata metadata,
TemporaryResources tmp) |
static TikaInputStream |
get(URI uri)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
get(URL url)
Creates a TikaInputStream from the resource at the given URL.
|
static TikaInputStream |
get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
File |
getFile() |
FileChannel |
getFileChannel() |
InputStreamFactory |
getInputStreamFactory()
If the Stream was created from an
InputStreamFactory ,
return that, otherwise null . |
long |
getLength()
Returns the length (in bytes) of this stream.
|
Object |
getOpenContainer()
Returns the open container object if any, such as a
POIFS FileSystem in the event of an OLE2 document
being detected and processed by the OLE2 detector.
|
Path |
getPath()
If the user created this TikaInputStream with a file,
the original file will be returned.
|
Path |
getPath(int maxBytes) |
long |
getPosition()
Returns the current position within the stream.
|
boolean |
hasFile() |
boolean |
hasInputStreamFactory() |
boolean |
hasLength() |
static boolean |
isTikaInputStream(InputStream stream)
Checks whether the given stream is a TikaInputStream instance.
|
void |
mark(int readlimit) |
boolean |
markSupported() |
int |
peek(byte[] buffer)
Fills the given buffer with upcoming bytes from this stream without
advancing the current stream position.
|
void |
reset() |
void |
setOpenContainer(Object container)
Stores the open container object against
the stream, eg after a Zip contents
detector has loaded the file to decide
what it contains.
|
long |
skip(long ln)
This relies on
IOUtils.skip(InputStream, long, byte[]) to ensure
that the alleged bytes skipped were actually skipped. |
String |
toString() |
handleIOException, isCauseOf, throwIfCauseOf
public static boolean isTikaInputStream(InputStream stream)
null
, in which case the return
value is false
.stream
- input stream, possibly null
true
if the stream is a TikaInputStream instance,
false
otherwisepublic static TikaInputStream get(InputStream stream, TemporaryResources tmp, Metadata metadata)
The given temporary file provider is used for any temporary files, and should be disposed when the returned stream is no longer used.
Use this method instead of the get(InputStream)
alternative
when you don't explicitly close the returned stream. The
recommended access pattern is:
try (TemporaryResources tmp = new TemporaryResources()) { TikaInputStream stream = TikaInputStream.get(..., tmp); // process stream but don't close it }
The given stream instance will not be closed when the
TemporaryResources.close()
method is called by the
try-with-resources statement. The caller is expected to explicitly
close the original stream when it's no longer used.
stream
- normal input stream@Deprecated public static TikaInputStream get(InputStream stream, TemporaryResources tmp)
get(InputStream, TemporaryResources, Metadata)
stream
- tmp
- public static TikaInputStream get(InputStream stream)
Use this method instead of the
get(InputStream, TemporaryResources, Metadata)
alternative when you
do explicitly close the returned stream. The recommended
access pattern is:
try (TikaInputStream stream = TikaInputStream.get(...)) { // process stream }
The given stream instance will be closed along with any other resources
associated with the returned TikaInputStream instance when the
close()
method is called by the try-with-resources statement.
stream
- normal input streampublic static TikaInputStream cast(InputStream stream)
null
if the stream is not a TikaInputStream.stream
- normal input streampublic static TikaInputStream get(byte[] data)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input datapublic static TikaInputStream get(byte[] data, Metadata metadata)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input datametadata
- metadata instanceIOException
public static TikaInputStream get(Path path) throws IOException
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
path
- input fileIOException
- if an I/O error occurspublic static TikaInputStream get(Path path, Metadata metadata) throws IOException
If there's an TikaCoreProperties.RESOURCE_NAME_KEY
in the
metadata object, this will not overwrite that value with the path's name.
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
path
- input filemetadata
- metadata instanceIOException
- if an I/O error occurspublic static TikaInputStream get(Path path, Metadata metadata, TemporaryResources tmp) throws IOException
IOException
@Deprecated public static TikaInputStream get(File file) throws FileNotFoundException
get(Path)
. In Tika 2.0, this will be removed
or modified to throw an IOException.Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input fileFileNotFoundException
- if the file does not exist@Deprecated public static TikaInputStream get(File file, Metadata metadata) throws FileNotFoundException
get(Path, Metadata)
. In Tika 2.0,
this will be removed or modified to throw an IOException.Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input filemetadata
- metadata instanceFileNotFoundException
- if the file does not exist
or cannot be opened for readingpublic static TikaInputStream get(InputStreamFactory factory) throws IOException
InputStream
s for the same resource multiple times.
This is typically desired when working with Parser
s that
need to re-read the stream multiple times, where other forms
of buffering (eg File) are slower than just getting a fresh
new stream each time.
IOException
public static TikaInputStream get(InputStreamFactory factory, TemporaryResources tmp) throws IOException
InputStream
s for the same resource multiple times.
This is typically desired when working with Parser
s that
need to re-read the stream multiple times, where other forms
of buffering (eg File) are slower than just getting a fresh
new stream each time.
IOException
public static TikaInputStream get(Blob blob) throws SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOBSQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(Blob blob, Metadata metadata) throws SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOBmetadata
- metadata instanceSQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(URI uri) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URIIOException
- if the resource can not be accessedpublic static TikaInputStream get(URI uri, Metadata metadata) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URImetadata
- metadata instanceIOException
- if the resource can not be accessedpublic static TikaInputStream get(URL url) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URLIOException
- if the resource can not be accessedpublic static TikaInputStream get(URL url, Metadata metadata) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URLmetadata
- metadata instanceIOException
- if the resource can not be accessedpublic int peek(byte[] buffer) throws IOException
buffer
- byte bufferIOException
- if the stream can not be readpublic Object getOpenContainer()
null
if nonepublic void setOpenContainer(Object container)
public void addCloseableResource(Closeable closeable)
closeable
- public boolean hasInputStreamFactory()
public InputStreamFactory getInputStreamFactory()
InputStreamFactory
,
return that, otherwise null
.public boolean hasFile()
public Path getPath() throws IOException
IOException
public Path getPath(int maxBytes) throws IOException
maxBytes
- if this is less than 0 and if an underlying file doesn't already exist,
the full file will be spooled to diskmaxBytes
, or null
if the underlying stream was longer than maxBytes.IOException
public File getFile() throws IOException
IOException
getPath()
public FileChannel getFileChannel() throws IOException
IOException
public boolean hasLength()
public long getLength() throws IOException
getPath()
method to buffer the entire stream to
a temporary file in order to calculate the stream length. This case
will only work if the stream has not yet been consumed.IOException
- if the length can not be determinedpublic long getPosition()
public long skip(long ln) throws IOException
IOUtils.skip(InputStream, long, byte[])
to ensure
that the alleged bytes skipped were actually skipped.skip
in class org.apache.commons.io.input.ProxyInputStream
ln
- the number of bytes to skipIOException
- if the number of bytes requested to be skipped does not match the
number of bytes skipped or if there's an IOException during the read.public void mark(int readlimit)
mark
in class org.apache.commons.io.input.ProxyInputStream
public boolean markSupported()
markSupported
in class org.apache.commons.io.input.ProxyInputStream
public void reset() throws IOException
reset
in class org.apache.commons.io.input.ProxyInputStream
IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class org.apache.commons.io.input.ProxyInputStream
IOException
protected void afterRead(int n) throws IOException
afterRead
in class org.apache.commons.io.input.ProxyInputStream
IOException
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.