Class TikaInputStream
- All Implemented Interfaces:
Closeable,AutoCloseable
This implementation uses backing strategies to handle different input types:
ByteArraySourcefor byte[] inputs - no caching neededFileSourcefor Path/File inputs - direct file accessCachingSourcefor InputStream inputs - caches bytes as read
- Since:
- Apache Tika 0.8
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.commons.io.input.ProxyInputStream
org.apache.commons.io.input.ProxyInputStream.AbstractBuilder<T extends Object,B extends org.apache.commons.io.build.AbstractStreamBuilder<T, B>> -
Field Summary
FieldsFields inherited from class java.io.FilterInputStream
in -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedTikaInputStream(InputStream stream, long length) Protected constructor for subclasses. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddCloseableResource(Closeable closeable) protected voidafterRead(int n) voidclose()voidEnables full rewind capability for this stream.static TikaInputStreamget(byte[] data) static TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamget(InputStream stream) static TikaInputStreamget(InputStream stream, TemporaryResources tmp, Metadata metadata) static TikaInputStreamget(InputStream stream, Metadata metadata) static TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamstatic TikaInputStreamget(Path path, Metadata metadata, TemporaryResources tmp) static TikaInputStreamstatic TikaInputStreamgetFile()static TikaInputStreamgetFromContainer(Object openContainer, long length, Metadata metadata) longgetPath()longbooleanhasFile()booleanbooleanvoidmark(int readlimit) booleanintpeek(byte[] buffer) voidvoidreset()voidrewind()Rewind the stream to the beginning.voidvoidsetOpenContainer(Object container) protected voidsetPosition(long position) longskip(long n) Skips up tonbytes.toString()Methods inherited from class org.apache.commons.io.input.TaggedInputStream
handleIOException, isCauseOf, throwIfCauseOfMethods inherited from class org.apache.commons.io.input.ProxyInputStream
available, beforeRead, read, read, read, setReference, unwrapMethods inherited from class java.io.InputStream
nullInputStream, readAllBytes, readNBytes, readNBytes, skipNBytes, transferTo
-
Field Details
-
tmp
-
-
Constructor Details
-
TikaInputStream
Protected constructor for subclasses.
-
-
Method Details
-
get
-
get
-
get
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
public static TikaInputStream get(Path path, Metadata metadata, TemporaryResources tmp) throws IOException - Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
SQLExceptionIOException
-
get
- Throws:
SQLExceptionIOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
get
- Throws:
IOException
-
getFromContainer
public static TikaInputStream getFromContainer(Object openContainer, long length, Metadata metadata) throws IOException - Throws:
IOException
-
skip
Skips up tonbytes. Returns the actual number of bytes skipped, which may be less than requested if the end of stream is reached.This method does NOT throw
EOFExceptionif fewer bytes are available. Callers must check the return value to determine how many bytes were actually skipped.- Overrides:
skipin classorg.apache.commons.io.input.ProxyInputStream- Parameters:
n- the number of bytes to skip- Returns:
- the actual number of bytes skipped (may be less than
n) - Throws:
IOException
-
mark
public void mark(int readlimit) - Overrides:
markin classorg.apache.commons.io.input.ProxyInputStream
-
markSupported
public boolean markSupported()- Overrides:
markSupportedin classorg.apache.commons.io.input.ProxyInputStream
-
reset
- Overrides:
resetin classorg.apache.commons.io.input.ProxyInputStream- Throws:
IOException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classorg.apache.commons.io.input.ProxyInputStream- Throws:
IOException
-
afterRead
- Overrides:
afterReadin classorg.apache.commons.io.input.ProxyInputStream- Throws:
IOException
-
peek
- Throws:
IOException
-
getOpenContainer
-
setOpenContainer
-
addCloseableResource
-
hasFile
public boolean hasFile() -
getPath
- Throws:
IOException
-
getFile
- Throws:
IOException
-
getFileChannel
- Throws:
IOException
-
hasLength
public boolean hasLength() -
getLength
- Throws:
IOException
-
getPosition
public long getPosition() -
setPosition
protected void setPosition(long position) -
setCloseShield
public void setCloseShield() -
removeCloseShield
public void removeCloseShield() -
isCloseShield
public boolean isCloseShield() -
rewind
Rewind the stream to the beginning.For streams created from byte arrays or files, this always works. For streams created from raw InputStreams, this requires
enableRewind()to have been called first.- Throws:
IOException
-
enableRewind
public void enableRewind()Enables full rewind capability for this stream.For streams backed by byte arrays or files, this is a no-op since they are inherently rewindable. For streams backed by raw InputStreams, this switches from passthrough mode to caching mode, enabling subsequent
rewind(),mark(int)/reset(), and random access.Must be called when position is 0 (before any reading), otherwise throws IllegalStateException.
Use this method when you know you'll need to rewind the stream later (e.g., for detection followed by parsing, or digest calculation). For streaming-only operations (e.g., HTML parsing), skip this call to avoid unnecessary caching overhead.
- Throws:
IllegalStateException- if position is not 0
-
toString
-