Class TikaArchiveStreamFactory
- java.lang.Object
- 
- org.apache.tika.detect.zip.TikaArchiveStreamFactory
 
- 
- All Implemented Interfaces:
- org.apache.commons.compress.archivers.ArchiveStreamProvider
 
 public class TikaArchiveStreamFactory extends Object implements org.apache.commons.compress.archivers.ArchiveStreamProvider Factory to create Archive[In|Out]putStreams from names or the first bytes of the InputStream. In order to add other implementations, you should extend ArchiveStreamFactory and override the appropriate methods (and call their implementation from super of course).Compressing a ZIP-File: final OutputStream out = Files.newOutputStream(output.toPath()); ArchiveOutputStream os = new ArchiveStreamFactory().createArchiveOutputStream(ArchiveStreamFactory.ZIP, out); os.putArchiveEntry(new ZipArchiveEntry("testdata/test1.xml")); IOUtils.copy(Files.newInputStream(file1.toPath()), os); os.closeArchiveEntry(); os.putArchiveEntry(new ZipArchiveEntry("testdata/test2.xml")); IOUtils.copy(Files.newInputStream(file2.toPath()), os); os.closeArchiveEntry(); os.close();Decompressing a ZIP-File: final InputStream is = Files.newInputStream(input.toPath()); ArchiveInputStream in = new ArchiveStreamFactory().createArchiveInputStream(ArchiveStreamFactory.ZIP, is); ZipArchiveEntry entry = (ZipArchiveEntry) in.getNextEntry(); OutputStream out = Files.newOutputStream(dir.toPath().resolve(entry.getName())); IOUtils.copy(in, out); out.close(); in.close(); 
- 
- 
Field SummaryFields Modifier and Type Field Description static StringAPKConstant (value "apk") used to identify the APK archive format.static StringAPKMConstant (value "apkm") used to identify the APKM archive format.static StringAPKSConstant (value "apks") used to identify the APKS archive format.static StringARConstant (value "ar") used to identify the AR archive format.static StringARJConstant (value "arj") used to identify the ARJ archive format.static StringCPIOConstant (value "cpio") used to identify the CPIO archive format.static TikaArchiveStreamFactoryDEFAULTThe singleton instance using the platform default encoding.static StringDUMPConstant (value "dump") used to identify the Unix DUMP archive format.static StringJARConstant (value "jar") used to identify the JAR archive format.static StringSEVEN_ZConstant (value "7z") used to identify the 7z archive format.static StringTARConstant used to identify the TAR archive format.static StringXAPKConstant (value "xapk") used to identify the XAPK archive format.static StringZIPConstant (value "zip") used to identify the ZIP archive format.
 - 
Constructor SummaryConstructors Constructor Description TikaArchiveStreamFactory()Constructs an instance using the platform default encoding.TikaArchiveStreamFactory(String encoding)Constructs an instance using the specified encoding.
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
 IcreateArchiveInputStream(InputStream in)Creates an archive input stream from an input stream, autodetecting the archive type from the first few bytes of the stream.<I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
 IcreateArchiveInputStream(String archiverName, InputStream in)Creates an archive input stream from an archiver name and an input stream.<I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
 IcreateArchiveInputStream(String archiverName, InputStream in, String actualEncoding)<O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
 OcreateArchiveOutputStream(String archiverName, OutputStream out)Creates an archive output stream from an archiver name and an output stream.<O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>>
 OcreateArchiveOutputStream(String archiverName, OutputStream out, String actualEncoding)static Stringdetect(InputStream in)Try to determine the type of Archiverstatic SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>findAvailableArchiveInputStreamProviders()Constructs a new sorted map from input stream provider names to provider objects.static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>findAvailableArchiveOutputStreamProviders()Constructs a new sorted map from output stream provider names to provider objects.SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>getArchiveInputStreamProviders()SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider>getArchiveOutputStreamProviders()StringgetEntryEncoding()Gets the encoding to use for arj, jar, ZIP, dump, cpio and tar files, or null for the archiver default.Set<String>getInputStreamArchiveNames()Set<String>getOutputStreamArchiveNames()voidsetEntryEncoding(String entryEncoding)Deprecated.1.10 use#ArchiveStreamFactory(String)to specify the encoding
 
- 
- 
- 
Field Detail- 
DEFAULTpublic static final TikaArchiveStreamFactory DEFAULT The singleton instance using the platform default encoding.- Since:
- 1.21
 
 - 
APKpublic static final String APK Constant (value "apk") used to identify the APK archive format.APK file extensions are .apk, .xapk, .apks, .apkm - Since:
- 1.22
- See Also:
- Constant Field Values
 
 - 
XAPKpublic static final String XAPK Constant (value "xapk") used to identify the XAPK archive format.APK file extensions are .apk, .xapk, .apks, .apkm - Since:
- 1.22
- See Also:
- Constant Field Values
 
 - 
APKSpublic static final String APKS Constant (value "apks") used to identify the APKS archive format.APK file extensions are .apk, .xapk, .apks, .apkm - Since:
- 1.22
- See Also:
- Constant Field Values
 
 - 
APKMpublic static final String APKM Constant (value "apkm") used to identify the APKM archive format.APK file extensions are .apk, .xapk, .apks, .apkm - Since:
- 1.22
- See Also:
- Constant Field Values
 
 - 
ARpublic static final String AR Constant (value "ar") used to identify the AR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
 
 - 
ARJpublic static final String ARJ Constant (value "arj") used to identify the ARJ archive format. Not supported as an output stream type.- Since:
- 1.6
- See Also:
- Constant Field Values
 
 - 
CPIOpublic static final String CPIO Constant (value "cpio") used to identify the CPIO archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
 
 - 
DUMPpublic static final String DUMP Constant (value "dump") used to identify the Unix DUMP archive format. Not supported as an output stream type.- Since:
- 1.3
- See Also:
- Constant Field Values
 
 - 
JARpublic static final String JAR Constant (value "jar") used to identify the JAR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
 
 - 
TARpublic static final String TAR Constant used to identify the TAR archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
 
 - 
ZIPpublic static final String ZIP Constant (value "zip") used to identify the ZIP archive format.- Since:
- 1.1
- See Also:
- Constant Field Values
 
 - 
SEVEN_Zpublic static final String SEVEN_Z Constant (value "7z") used to identify the 7z archive format.- Since:
- 1.8
- See Also:
- Constant Field Values
 
 
- 
 - 
Constructor Detail- 
TikaArchiveStreamFactorypublic TikaArchiveStreamFactory() Constructs an instance using the platform default encoding.
 - 
TikaArchiveStreamFactorypublic TikaArchiveStreamFactory(String encoding) Constructs an instance using the specified encoding.- Parameters:
- encoding- the encoding to be used.
- Since:
- 1.10
 
 
- 
 - 
Method Detail- 
detectpublic static String detect(InputStream in) throws org.apache.commons.compress.archivers.ArchiveException Try to determine the type of Archiver- Parameters:
- in- input stream
- Returns:
- type of archiver if found
- Throws:
- org.apache.commons.compress.archivers.ArchiveException- if an archiver cannot be detected in the stream
- Since:
- 1.14
 
 - 
findAvailableArchiveInputStreamProviderspublic static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveInputStreamProviders() Constructs a new sorted map from input stream provider names to provider objects.The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified. The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection. This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine. - Returns:
- An immutable, map from names to provider objects
- Since:
- 1.13
 
 - 
findAvailableArchiveOutputStreamProviderspublic static SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveOutputStreamProviders() Constructs a new sorted map from output stream provider names to provider objects.The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified. The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection. This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine. - Returns:
- An immutable, map from names to provider objects
- Since:
- 1.13
 
 - 
createArchiveInputStreampublic <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(InputStream in) throws org.apache.commons.compress.archivers.ArchiveException Creates an archive input stream from an input stream, autodetecting the archive type from the first few bytes of the stream. The InputStream must support marks, like BufferedInputStream.- Type Parameters:
- I- The- ArchiveInputStreamtype.
- Parameters:
- in- the input stream
- Returns:
- the archive input stream
- Throws:
- org.apache.commons.compress.archivers.ArchiveException- if the archiver name is not known
- org.apache.commons.compress.archivers.StreamingNotSupportedException- if the format cannot be read from a stream
- IllegalArgumentException- if the stream is null or does not support mark
 
 - 
createArchiveInputStreampublic <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(String archiverName, InputStream in) throws org.apache.commons.compress.archivers.ArchiveException Creates an archive input stream from an archiver name and an input stream.- Type Parameters:
- I- The- ArchiveInputStreamtype.
- Parameters:
- archiverName- the archive name, i.e. "ar", "arj", "zip", "tar", "jar", "cpio", "dump" or "7z"
- in- the input stream
- Returns:
- the archive input stream
- Throws:
- org.apache.commons.compress.archivers.ArchiveException- if the archiver name is not known
- org.apache.commons.compress.archivers.StreamingNotSupportedException- if the format cannot be read from a stream
- IllegalArgumentException- if the archiver name or stream is null
 
 - 
createArchiveInputStreampublic <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream(String archiverName, InputStream in, String actualEncoding) throws org.apache.commons.compress.archivers.ArchiveException - Specified by:
- createArchiveInputStreamin interface- org.apache.commons.compress.archivers.ArchiveStreamProvider
- Throws:
- org.apache.commons.compress.archivers.ArchiveException
 
 - 
createArchiveOutputStreampublic <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream(String archiverName, OutputStream out) throws org.apache.commons.compress.archivers.ArchiveException Creates an archive output stream from an archiver name and an output stream.- Type Parameters:
- O- The- ArchiveOutputStreamtype.
- Parameters:
- archiverName- the archive name, i.e. "ar", "zip", "tar", "jar" or "cpio"
- out- the output stream
- Returns:
- the archive output stream
- Throws:
- org.apache.commons.compress.archivers.ArchiveException- if the archiver name is not known
- org.apache.commons.compress.archivers.StreamingNotSupportedException- if the format cannot be written to a stream
- IllegalArgumentException- if the archiver name or stream is null
 
 - 
createArchiveOutputStreampublic <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream(String archiverName, OutputStream out, String actualEncoding) throws org.apache.commons.compress.archivers.ArchiveException - Specified by:
- createArchiveOutputStreamin interface- org.apache.commons.compress.archivers.ArchiveStreamProvider
- Throws:
- org.apache.commons.compress.archivers.ArchiveException
 
 - 
getArchiveInputStreamProviderspublic SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveInputStreamProviders() 
 - 
getArchiveOutputStreamProviderspublic SortedMap<String,org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveOutputStreamProviders() 
 - 
getEntryEncodingpublic String getEntryEncoding() Gets the encoding to use for arj, jar, ZIP, dump, cpio and tar files, or null for the archiver default.- Returns:
- entry encoding, or null for the archiver default
- Since:
- 1.5
 
 - 
getInputStreamArchiveNamespublic Set<String> getInputStreamArchiveNames() - Specified by:
- getInputStreamArchiveNamesin interface- org.apache.commons.compress.archivers.ArchiveStreamProvider
 
 - 
getOutputStreamArchiveNamespublic Set<String> getOutputStreamArchiveNames() - Specified by:
- getOutputStreamArchiveNamesin interface- org.apache.commons.compress.archivers.ArchiveStreamProvider
 
 - 
setEntryEncoding@Deprecated public void setEntryEncoding(String entryEncoding) Deprecated.1.10 use#ArchiveStreamFactory(String)to specify the encodingSets the encoding to use for arj, jar, ZIP, dump, cpio and tar files. Use null for the archiver default.- Parameters:
- entryEncoding- the entry encoding, null uses the archiver default.
- Since:
- 1.5
 
 
- 
 
-