public class TikaInputStream extends TaggedInputStream
InputStream
instance passed through the
Parser
interface and other similar APIs.
TikaInputStream instances can be created using the various static
get()
factory methods. Most of these methods take an optional
Metadata
argument that is then filled with the available input
metadata from the given resource. The created TikaInputStream instance
keeps track of the original resource used to create it, while behaving
otherwise just like a normal, buffered InputStream
.
A TikaInputStream instance is also guaranteed to support the
mark(int)
feature.
Code that wants to access the underlying file or other resources
associated with a TikaInputStream should first use the
get(InputStream)
factory method to cast or wrap a given
InputStream
into a TikaInputStream instance.
TikaInputStream includes a few safety features to protect against parsers
that may fail to check for an EOF or may incorrectly rely on the unreliable
value returned from FileInputStream.skip(long)
. These parser failures
can lead to infinite loops. We strongly encourage the use of
TikaInputStream.
in
Modifier and Type | Method and Description |
---|---|
protected void |
afterRead(int n)
Invoked by the read methods after the proxied call has returned
successfully.
|
static TikaInputStream |
cast(InputStream stream)
Returns the given stream casts to a TikaInputStream, or
null if the stream is not a TikaInputStream. |
void |
close()
Invokes the delegate's
close() method. |
static TikaInputStream |
get(Blob blob)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
get(byte[] data)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
get(File file)
Deprecated.
use
get(Path) . In Tika 2.0, this will be removed
or modified to throw an IOException. |
static TikaInputStream |
get(File file,
Metadata metadata)
Deprecated.
use
get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
get(InputStream stream)
Casts or wraps the given stream to a TikaInputStream instance.
|
static TikaInputStream |
get(InputStream stream,
TemporaryResources tmp)
Casts or wraps the given stream to a TikaInputStream instance.
|
static TikaInputStream |
get(Path path)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
get(URI uri)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
get(URL url)
Creates a TikaInputStream from the resource at the given URL.
|
static TikaInputStream |
get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
File |
getFile() |
FileChannel |
getFileChannel() |
long |
getLength()
Returns the length (in bytes) of this stream.
|
Object |
getOpenContainer()
Returns the open container object, such as a
POIFS FileSystem in the event of an OLE2
document being detected and processed by
the OLE2 detector.
|
Path |
getPath()
If the user created this TikaInputStream with a file,
the original file will be returned.
|
Path |
getPath(int maxBytes) |
long |
getPosition()
Returns the current position within the stream.
|
boolean |
hasFile() |
boolean |
hasLength() |
static boolean |
isTikaInputStream(InputStream stream)
Checks whether the given stream is a TikaInputStream instance.
|
void |
mark(int readlimit)
Invokes the delegate's
mark(int) method. |
boolean |
markSupported()
Invokes the delegate's
markSupported() method. |
int |
peek(byte[] buffer)
Fills the given buffer with upcoming bytes from this stream without
advancing the current stream position.
|
void |
reset()
Invokes the delegate's
reset() method. |
void |
setOpenContainer(Object container)
Stores the open container object against
the stream, eg after a Zip contents
detector has loaded the file to decide
what it contains.
|
long |
skip(long ln)
This relies on
IOUtils.skip(InputStream, long) to ensure
that the alleged bytes skipped were actually skipped. |
String |
toString() |
handleIOException, isCauseOf, throwIfCauseOf
available, beforeRead, read, read, read
public static boolean isTikaInputStream(InputStream stream)
null
, in which case the return
value is false
.stream
- input stream, possibly null
true
if the stream is a TikaInputStream instance,
false
otherwisepublic static TikaInputStream get(InputStream stream, TemporaryResources tmp)
The given temporary file provider is used for any temporary files, and should be disposed when the returned stream is no longer used.
Use this method instead of the get(InputStream)
alternative
when you don't explicitly close the returned stream. The
recommended access pattern is:
try (TemporaryResources tmp = new TemporaryResources()) { TikaInputStream stream = TikaInputStream.get(..., tmp); // process stream but don't close it }
The given stream instance will not be closed when the
TemporaryResources.close()
method is called by the
try-with-resources statement. The caller is expected to explicitly
close the original stream when it's no longer used.
stream
- normal input streampublic static TikaInputStream get(InputStream stream)
Use this method instead of the
get(InputStream, TemporaryResources)
alternative when you
do explicitly close the returned stream. The recommended
access pattern is:
try (TikaInputStream stream = TikaInputStream.get(...)) { // process stream }
The given stream instance will be closed along with any other resources
associated with the returned TikaInputStream instance when the
close()
method is called by the try-with-resources statement.
stream
- normal input streampublic static TikaInputStream cast(InputStream stream)
null
if the stream is not a TikaInputStream.stream
- normal input streampublic static TikaInputStream get(byte[] data)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input datapublic static TikaInputStream get(byte[] data, Metadata metadata)
Note that you must always explicitly close the returned stream as in some cases it may end up writing the given data to a temporary file.
data
- input datametadata
- metadata instanceIOException
public static TikaInputStream get(Path path) throws IOException
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
path
- input fileIOException
- if an I/O error occurspublic static TikaInputStream get(Path path, Metadata metadata) throws IOException
Note that you must always explicitly close the returned stream to prevent leaking open file handles.
path
- input filemetadata
- metadata instanceIOException
- if an I/O error occurs@Deprecated public static TikaInputStream get(File file) throws FileNotFoundException
get(Path)
. In Tika 2.0, this will be removed
or modified to throw an IOException.Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input fileFileNotFoundException
- if the file does not exist@Deprecated public static TikaInputStream get(File file, Metadata metadata) throws FileNotFoundException
get(Path, Metadata)
. In Tika 2.0,
this will be removed or modified to throw an IOException.Note that you must always explicitly close the returned stream to prevent leaking open file handles.
file
- input filemetadata
- metadata instanceFileNotFoundException
- if the file does not exist
or cannot be opened for readingpublic static TikaInputStream get(Blob blob) throws SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOBSQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(Blob blob, Metadata metadata) throws SQLException
Note that the result set containing the BLOB may need to be kept open until the returned TikaInputStream has been processed and closed. You must also always explicitly close the returned stream as in some cases it may end up writing the blob data to a temporary file.
blob
- database BLOBmetadata
- metadata instanceSQLException
- if BLOB data can not be accessedpublic static TikaInputStream get(URI uri) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URIIOException
- if the resource can not be accessedpublic static TikaInputStream get(URI uri, Metadata metadata) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
uri
- resource URImetadata
- metadata instanceIOException
- if the resource can not be accessedpublic static TikaInputStream get(URL url) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URLIOException
- if the resource can not be accessedpublic static TikaInputStream get(URL url, Metadata metadata) throws IOException
Note that you must always explicitly close the returned stream as in some cases it may end up writing the resource to a temporary file.
url
- resource URLmetadata
- metadata instanceIOException
- if the resource can not be accessedpublic int peek(byte[] buffer) throws IOException
buffer
- byte bufferIOException
- if the stream can not be readpublic Object getOpenContainer()
public void setOpenContainer(Object container)
public boolean hasFile()
public Path getPath() throws IOException
IOException
public Path getPath(int maxBytes) throws IOException
maxBytes
- if this is less than 0 and if an underlying file doesn't already exist,
the full file will be spooled to diskmaxBytes
, or null
if the underlying stream was longer than maxBytes.IOException
public File getFile() throws IOException
IOException
getPath()
public FileChannel getFileChannel() throws IOException
IOException
public boolean hasLength()
public long getLength() throws IOException
getPath()
method to buffer the entire stream to
a temporary file in order to calculate the stream length. This case
will only work if the stream has not yet been consumed.IOException
- if the length can not be determinedpublic long getPosition()
public long skip(long ln) throws IOException
IOUtils.skip(InputStream, long)
to ensure
that the alleged bytes skipped were actually skipped.skip
in class ProxyInputStream
ln
- the number of bytes to skipIOException
- if the number of bytes requested to be skipped does not match the number of bytes skipped
or if there's an IOException during the read.public void mark(int readlimit)
ProxyInputStream
mark(int)
method.mark
in class ProxyInputStream
readlimit
- read ahead limitpublic boolean markSupported()
ProxyInputStream
markSupported()
method.markSupported
in class ProxyInputStream
public void reset() throws IOException
ProxyInputStream
reset()
method.reset
in class ProxyInputStream
IOException
- if an I/O error occurspublic void close() throws IOException
ProxyInputStream
close()
method.close
in interface Closeable
close
in interface AutoCloseable
close
in class ProxyInputStream
IOException
- if an I/O error occursprotected void afterRead(int n) throws IOException
ProxyInputStream
Subclasses can override this method to add common post-processing functionality without having to override all the read methods. The default implementation does nothing.
Note this method is not called from ProxyInputStream.skip(long)
or
ProxyInputStream.reset()
. You need to explicitly override those methods if
you want to add post-processing steps also to them.
afterRead
in class ProxyInputStream
n
- number of bytes read, or -1 if the end of stream was reachedIOException
- if the post-processing failspublic String toString()
toString
in class TaggedInputStream
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.