Class TikaArchiveStreamFactory
- java.lang.Object
-
- org.apache.tika.detect.zip.TikaArchiveStreamFactory
-
- All Implemented Interfaces:
org.apache.commons.compress.archivers.ArchiveStreamProvider
public class TikaArchiveStreamFactory extends Object implements org.apache.commons.compress.archivers.ArchiveStreamProvider
Factory to create Archive[In|Out]putStreams from names or the first bytes of the InputStream. In order to add other implementations, you should extend ArchiveStreamFactory and override the appropriate methods (and call their implementation from super of course).Compressing a ZIP-File:
final OutputStream out = Files.newOutputStream(output.toPath()); ArchiveOutputStream os = new ArchiveStreamFactory().createArchiveOutputStream(ArchiveStreamFactory.ZIP, out); os.putArchiveEntry(new ZipArchiveEntry("testdata/test1.xml")); IOUtils.copy(Files.newInputStream(file1.toPath()), os); os.closeArchiveEntry(); os.putArchiveEntry(new ZipArchiveEntry("testdata/test2.xml")); IOUtils.copy(Files.newInputStream(file2.toPath()), os); os.closeArchiveEntry(); os.close();
Decompressing a ZIP-File:
final InputStream is = Files.newInputStream(input.toPath()); ArchiveInputStream in = new ArchiveStreamFactory().createArchiveInputStream(ArchiveStreamFactory.ZIP, is); ZipArchiveEntry entry = (ZipArchiveEntry) in.getNextEntry(); OutputStream out = Files.newOutputStream(dir.toPath().resolve(entry.getName())); IOUtils.copy(in, out); out.close(); in.close();
-
-
Field Summary
Fields Modifier and Type Field Description static String
APK
Constant (value "apk") used to identify the APK archive format.static String
APKM
Constant (value "apkm") used to identify the APKM archive format.static String
APKS
Constant (value "apks") used to identify the APKS archive format.static String
AR
Constant (value "ar") used to identify the AR archive format.static String
ARJ
Constant (value "arj") used to identify the ARJ archive format.static String
CPIO
Constant (value "cpio") used to identify the CPIO archive format.static TikaArchiveStreamFactory
DEFAULT
The singleton instance using the platform default encoding.static String
DUMP
Constant (value "dump") used to identify the Unix DUMP archive format.static String
JAR
Constant (value "jar") used to identify the JAR archive format.static String
SEVEN_Z
Constant (value "7z") used to identify the 7z archive format.static String
TAR
Constant used to identify the TAR archive format.static String
XAPK
Constant (value "xapk") used to identify the XAPK archive format.static String
ZIP
Constant (value "zip") used to identify the ZIP archive format.
-
Constructor Summary
Constructors Constructor Description TikaArchiveStreamFactory()
Constructs an instance using the platform default encoding.TikaArchiveStreamFactory(String encoding)
Constructs an instance using the specified encoding.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
IcreateArchiveInputStream(InputStream in)
Creates an archive input stream from an input stream, autodetecting the archive type from the first few bytes of the stream.<I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
IcreateArchiveInputStream(String archiverName, InputStream in)
Creates an archive input stream from an archiver name and an input stream.<I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
IcreateArchiveInputStream(String archiverName, InputStream in, String actualEncoding)
<O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
OcreateArchiveOutputStream(String archiverName, OutputStream out)
Creates an archive output stream from an archiver name and an output stream.<O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
OcreateArchiveOutputStream(String archiverName, OutputStream out, String actualEncoding)
static String
detect(InputStream in)
Try to determine the type of Archiverstatic SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>
findAvailableArchiveInputStreamProviders()
Constructs a new sorted map from input stream provider names to provider objects.static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>
findAvailableArchiveOutputStreamProviders()
Constructs a new sorted map from output stream provider names to provider objects.SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>
getArchiveInputStreamProviders()
SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>
getArchiveOutputStreamProviders()
String
getEntryEncoding()
Gets the encoding to use for arj, jar, ZIP, dump, cpio and tar files, or null for the archiver default.Set<String>
getInputStreamArchiveNames()
Set<String>
getOutputStreamArchiveNames()
void
setEntryEncoding(String entryEncoding)
Deprecated.1.10 use#ArchiveStreamFactory(String)
to specify the encoding
-
-
-
Field Detail
-
DEFAULT
public static final TikaArchiveStreamFactory DEFAULT
The singleton instance using the platform default encoding.- Since:
- 1.21
-
APK
public static final String APK
Constant (value "apk") used to identify the APK archive format.APK file extensions are .apk, .xapk, .apks, .apkm
- Since:
- 1.22
- See Also:
- Constant Field Values
-
XAPK
public static final String XAPK
Constant (value "xapk") used to identify the XAPK archive format.APK file extensions are .apk, .xapk, .apks, .apkm
- Since:
- 1.22
- See Also:
- Constant Field Values
-
APKS
public static final String APKS
Constant (value "apks") used to identify the APKS archive format.APK file extensions are .apk, .xapk, .apks, .apkm
- Since:
- 1.22
- See Also:
- Constant Field Values
-
APKM
public static final String APKM
Constant (value "apkm") used to identify the APKM archive format.APK file extensions are .apk, .xapk, .apks, .apkm
- Since:
- 1.22
- See Also:
- Constant Field Values
-
AR
public static final String AR
Constant (value "ar") used to identify the AR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
-
ARJ
public static final String ARJ
Constant (value "arj") used to identify the ARJ archive format. Not supported as an output stream type.- Since:
- 1.6
- See Also:
- Constant Field Values
-
CPIO
public static final String CPIO
Constant (value "cpio") used to identify the CPIO archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
-
DUMP
public static final String DUMP
Constant (value "dump") used to identify the Unix DUMP archive format. Not supported as an output stream type.- Since:
- 1.3
- See Also:
- Constant Field Values
-
JAR
public static final String JAR
Constant (value "jar") used to identify the JAR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
-
TAR
public static final String TAR
Constant used to identify the TAR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
-
ZIP
public static final String ZIP
Constant (value "zip") used to identify the ZIP archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
-
SEVEN_Z
public static final String SEVEN_Z
Constant (value "7z") used to identify the 7z archive format.- Since:
- 1.8
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TikaArchiveStreamFactory
public TikaArchiveStreamFactory()
Constructs an instance using the platform default encoding.
-
TikaArchiveStreamFactory
public TikaArchiveStreamFactory(String encoding)
Constructs an instance using the specified encoding.- Parameters:
encoding
- the encoding to be used.- Since:
- 1.10
-
-
Method Detail
-
detect
public static String detect(InputStream in) throws org.apache.commons.compress.archivers.ArchiveException
Try to determine the type of Archiver- Parameters:
in
- input stream- Returns:
- type of archiver if found
- Throws:
org.apache.commons.compress.archivers.ArchiveException
- if an archiver cannot be detected in the stream- Since:
- 1.14
-
findAvailableArchiveInputStreamProviders
public static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveInputStreamProviders()
Constructs a new sorted map from input stream provider names to provider objects.The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified.
The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection.
This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine.
- Returns:
- An immutable, map from names to provider objects
- Since:
- 1.13
-
findAvailableArchiveOutputStreamProviders
public static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveOutputStreamProviders()
Constructs a new sorted map from output stream provider names to provider objects.The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified.
The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection.
This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine.
- Returns:
- An immutable, map from names to provider objects
- Since:
- 1.13
-
createArchiveInputStream
public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(InputStream in) throws org.apache.commons.compress.archivers.ArchiveException
Creates an archive input stream from an input stream, autodetecting the archive type from the first few bytes of the stream. The InputStream must support marks, like BufferedInputStream.- Type Parameters:
I
- TheArchiveInputStream
type.- Parameters:
in
- the input stream- Returns:
- the archive input stream
- Throws:
org.apache.commons.compress.archivers.ArchiveException
- if the archiver name is not knownorg.apache.commons.compress.archivers.StreamingNotSupportedException
- if the format cannot be read from a streamIllegalArgumentException
- if the stream is null or does not support mark
-
createArchiveInputStream
public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(String archiverName, InputStream in) throws org.apache.commons.compress.archivers.ArchiveException
Creates an archive input stream from an archiver name and an input stream.- Type Parameters:
I
- TheArchiveInputStream
type.- Parameters:
archiverName
- the archive name, i.e. "ar", "arj", "zip", "tar", "jar", "cpio", "dump" or "7z"in
- the input stream- Returns:
- the archive input stream
- Throws:
org.apache.commons.compress.archivers.ArchiveException
- if the archiver name is not knownorg.apache.commons.compress.archivers.StreamingNotSupportedException
- if the format cannot be read from a streamIllegalArgumentException
- if the archiver name or stream is null
-
createArchiveInputStream
public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(String archiverName, InputStream in, String actualEncoding) throws org.apache.commons.compress.archivers.ArchiveException
- Specified by:
createArchiveInputStream
in interfaceorg.apache.commons.compress.archivers.ArchiveStreamProvider
- Throws:
org.apache.commons.compress.archivers.ArchiveException
-
createArchiveOutputStream
public <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream(String archiverName, OutputStream out) throws org.apache.commons.compress.archivers.ArchiveException
Creates an archive output stream from an archiver name and an output stream.- Type Parameters:
O
- TheArchiveOutputStream
type.- Parameters:
archiverName
- the archive name, i.e. "ar", "zip", "tar", "jar" or "cpio"out
- the output stream- Returns:
- the archive output stream
- Throws:
org.apache.commons.compress.archivers.ArchiveException
- if the archiver name is not knownorg.apache.commons.compress.archivers.StreamingNotSupportedException
- if the format cannot be written to a streamIllegalArgumentException
- if the archiver name or stream is null
-
createArchiveOutputStream
public <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream(String archiverName, OutputStream out, String actualEncoding) throws org.apache.commons.compress.archivers.ArchiveException
- Specified by:
createArchiveOutputStream
in interfaceorg.apache.commons.compress.archivers.ArchiveStreamProvider
- Throws:
org.apache.commons.compress.archivers.ArchiveException
-
getArchiveInputStreamProviders
public SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveInputStreamProviders()
-
getArchiveOutputStreamProviders
public SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveOutputStreamProviders()
-
getEntryEncoding
public String getEntryEncoding()
Gets the encoding to use for arj, jar, ZIP, dump, cpio and tar files, or null for the archiver default.- Returns:
- entry encoding, or null for the archiver default
- Since:
- 1.5
-
getInputStreamArchiveNames
public Set<String> getInputStreamArchiveNames()
- Specified by:
getInputStreamArchiveNames
in interfaceorg.apache.commons.compress.archivers.ArchiveStreamProvider
-
getOutputStreamArchiveNames
public Set<String> getOutputStreamArchiveNames()
- Specified by:
getOutputStreamArchiveNames
in interfaceorg.apache.commons.compress.archivers.ArchiveStreamProvider
-
setEntryEncoding
@Deprecated public void setEntryEncoding(String entryEncoding)
Deprecated.1.10 use#ArchiveStreamFactory(String)
to specify the encodingSets the encoding to use for arj, jar, ZIP, dump, cpio and tar files. Use null for the archiver default.- Parameters:
entryEncoding
- the entry encoding, null uses the archiver default.- Since:
- 1.5
-
-