Class TikaArchiveStreamFactory

  • All Implemented Interfaces:
    org.apache.commons.compress.archivers.ArchiveStreamProvider

    public class TikaArchiveStreamFactory
    extends Object
    implements org.apache.commons.compress.archivers.ArchiveStreamProvider
    Factory to create Archive[In|Out]putStreams from names or the first bytes of the InputStream. In order to add other implementations, you should extend ArchiveStreamFactory and override the appropriate methods (and call their implementation from super of course).

    Compressing a ZIP-File:

     final OutputStream out = Files.newOutputStream(output.toPath());
     ArchiveOutputStream os = new ArchiveStreamFactory().createArchiveOutputStream(ArchiveStreamFactory.ZIP,
     out);
    
     os.putArchiveEntry(new ZipArchiveEntry("testdata/test1.xml"));
     IOUtils.copy(Files.newInputStream(file1.toPath()), os);
     os.closeArchiveEntry();
    
     os.putArchiveEntry(new ZipArchiveEntry("testdata/test2.xml"));
     IOUtils.copy(Files.newInputStream(file2.toPath()), os);
     os.closeArchiveEntry();
     os.close();
     

    Decompressing a ZIP-File:

     final InputStream is = Files.newInputStream(input.toPath());
     ArchiveInputStream in = new ArchiveStreamFactory().createArchiveInputStream(ArchiveStreamFactory.ZIP,
     is);
     ZipArchiveEntry entry = (ZipArchiveEntry) in.getNextEntry();
     OutputStream out = Files.newOutputStream(dir.toPath().resolve(entry.getName()));
     IOUtils.copy(in, out);
     out.close();
     in.close();
     
    • Field Detail

      • DEFAULT

        public static final TikaArchiveStreamFactory DEFAULT
        The singleton instance using the platform default encoding.
        Since:
        1.21
      • APK

        public static final String APK
        Constant (value "apk") used to identify the APK archive format.

        APK file extensions are .apk, .xapk, .apks, .apkm

        Since:
        1.22
        See Also:
        Constant Field Values
      • XAPK

        public static final String XAPK
        Constant (value "xapk") used to identify the XAPK archive format.

        APK file extensions are .apk, .xapk, .apks, .apkm

        Since:
        1.22
        See Also:
        Constant Field Values
      • APKS

        public static final String APKS
        Constant (value "apks") used to identify the APKS archive format.

        APK file extensions are .apk, .xapk, .apks, .apkm

        Since:
        1.22
        See Also:
        Constant Field Values
      • APKM

        public static final String APKM
        Constant (value "apkm") used to identify the APKM archive format.

        APK file extensions are .apk, .xapk, .apks, .apkm

        Since:
        1.22
        See Also:
        Constant Field Values
      • AR

        public static final String AR
        Constant (value "ar") used to identify the AR archive format.
        Since:
        1.1
        See Also:
        Constant Field Values
      • ARJ

        public static final String ARJ
        Constant (value "arj") used to identify the ARJ archive format. Not supported as an output stream type.
        Since:
        1.6
        See Also:
        Constant Field Values
      • CPIO

        public static final String CPIO
        Constant (value "cpio") used to identify the CPIO archive format.
        Since:
        1.1
        See Also:
        Constant Field Values
      • DUMP

        public static final String DUMP
        Constant (value "dump") used to identify the Unix DUMP archive format. Not supported as an output stream type.
        Since:
        1.3
        See Also:
        Constant Field Values
      • JAR

        public static final String JAR
        Constant (value "jar") used to identify the JAR archive format.
        Since:
        1.1
        See Also:
        Constant Field Values
      • ZIP

        public static final String ZIP
        Constant (value "zip") used to identify the ZIP archive format.
        Since:
        1.1
        See Also:
        Constant Field Values
      • SEVEN_Z

        public static final String SEVEN_Z
        Constant (value "7z") used to identify the 7z archive format.
        Since:
        1.8
        See Also:
        Constant Field Values
    • Constructor Detail

      • TikaArchiveStreamFactory

        public TikaArchiveStreamFactory()
        Constructs an instance using the platform default encoding.
      • TikaArchiveStreamFactory

        public TikaArchiveStreamFactory​(String encoding)
        Constructs an instance using the specified encoding.
        Parameters:
        encoding - the encoding to be used.
        Since:
        1.10
    • Method Detail

      • detect

        public static String detect​(InputStream in)
                             throws org.apache.commons.compress.archivers.ArchiveException
        Try to determine the type of Archiver
        Parameters:
        in - input stream
        Returns:
        type of archiver if found
        Throws:
        org.apache.commons.compress.archivers.ArchiveException - if an archiver cannot be detected in the stream
        Since:
        1.14
      • findAvailableArchiveInputStreamProviders

        public static SortedMap<String,​org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveInputStreamProviders()
        Constructs a new sorted map from input stream provider names to provider objects.

        The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified.

        The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection.

        This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine.

        Returns:
        An immutable, map from names to provider objects
        Since:
        1.13
      • findAvailableArchiveOutputStreamProviders

        public static SortedMap<String,​org.apache.commons.compress.archivers.ArchiveStreamProvider> findAvailableArchiveOutputStreamProviders()
        Constructs a new sorted map from output stream provider names to provider objects.

        The map returned by this method will have one entry for each provider for which support is available in the current Java virtual machine. If two or more supported provider have the same name then the resulting map will contain just one of them; which one it will contain is not specified.

        The invocation of this method, and the subsequent use of the resulting map, may cause time-consuming disk or network I/O operations to occur. This method is provided for applications that need to enumerate all of the available providers, for example to allow user provider selection.

        This method may return different results at different times if new providers are dynamically made available to the current Java virtual machine.

        Returns:
        An immutable, map from names to provider objects
        Since:
        1.13
      • createArchiveInputStream

        public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream​(InputStream in)
                                                                                                                                                                      throws org.apache.commons.compress.archivers.ArchiveException
        Creates an archive input stream from an input stream, autodetecting the archive type from the first few bytes of the stream. The InputStream must support marks, like BufferedInputStream.
        Type Parameters:
        I - The ArchiveInputStream type.
        Parameters:
        in - the input stream
        Returns:
        the archive input stream
        Throws:
        org.apache.commons.compress.archivers.ArchiveException - if the archiver name is not known
        org.apache.commons.compress.archivers.StreamingNotSupportedException - if the format cannot be read from a stream
        IllegalArgumentException - if the stream is null or does not support mark
      • createArchiveInputStream

        public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream​(String archiverName,
                                                                                                                                                                             InputStream in)
                                                                                                                                                                      throws org.apache.commons.compress.archivers.ArchiveException
        Creates an archive input stream from an archiver name and an input stream.
        Type Parameters:
        I - The ArchiveInputStream type.
        Parameters:
        archiverName - the archive name, i.e. "ar", "arj", "zip", "tar", "jar", "cpio", "dump" or "7z"
        in - the input stream
        Returns:
        the archive input stream
        Throws:
        org.apache.commons.compress.archivers.ArchiveException - if the archiver name is not known
        org.apache.commons.compress.archivers.StreamingNotSupportedException - if the format cannot be read from a stream
        IllegalArgumentException - if the archiver name or stream is null
      • createArchiveInputStream

        public <I extends org.apache.commons.compress.archivers.ArchiveInputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> I createArchiveInputStream​(String archiverName,
                                                                                                                                                                             InputStream in,
                                                                                                                                                                             String actualEncoding)
                                                                                                                                                                      throws org.apache.commons.compress.archivers.ArchiveException
        Specified by:
        createArchiveInputStream in interface org.apache.commons.compress.archivers.ArchiveStreamProvider
        Throws:
        org.apache.commons.compress.archivers.ArchiveException
      • createArchiveOutputStream

        public <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream​(String archiverName,
                                                                                                                                                                               OutputStream out)
                                                                                                                                                                        throws org.apache.commons.compress.archivers.ArchiveException
        Creates an archive output stream from an archiver name and an output stream.
        Type Parameters:
        O - The ArchiveOutputStream type.
        Parameters:
        archiverName - the archive name, i.e. "ar", "zip", "tar", "jar" or "cpio"
        out - the output stream
        Returns:
        the archive output stream
        Throws:
        org.apache.commons.compress.archivers.ArchiveException - if the archiver name is not known
        org.apache.commons.compress.archivers.StreamingNotSupportedException - if the format cannot be written to a stream
        IllegalArgumentException - if the archiver name or stream is null
      • createArchiveOutputStream

        public <O extends org.apache.commons.compress.archivers.ArchiveOutputStream<? extends org.apache.commons.compress.archivers.ArchiveEntry>> O createArchiveOutputStream​(String archiverName,
                                                                                                                                                                               OutputStream out,
                                                                                                                                                                               String actualEncoding)
                                                                                                                                                                        throws org.apache.commons.compress.archivers.ArchiveException
        Specified by:
        createArchiveOutputStream in interface org.apache.commons.compress.archivers.ArchiveStreamProvider
        Throws:
        org.apache.commons.compress.archivers.ArchiveException
      • getArchiveInputStreamProviders

        public SortedMap<String,​org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveInputStreamProviders()
      • getArchiveOutputStreamProviders

        public SortedMap<String,​org.apache.commons.compress.archivers.ArchiveStreamProvider> getArchiveOutputStreamProviders()
      • getEntryEncoding

        public String getEntryEncoding()
        Gets the encoding to use for arj, jar, ZIP, dump, cpio and tar files, or null for the archiver default.
        Returns:
        entry encoding, or null for the archiver default
        Since:
        1.5
      • getInputStreamArchiveNames

        public Set<String> getInputStreamArchiveNames()
        Specified by:
        getInputStreamArchiveNames in interface org.apache.commons.compress.archivers.ArchiveStreamProvider
      • getOutputStreamArchiveNames

        public Set<String> getOutputStreamArchiveNames()
        Specified by:
        getOutputStreamArchiveNames in interface org.apache.commons.compress.archivers.ArchiveStreamProvider
      • setEntryEncoding

        @Deprecated
        public void setEntryEncoding​(String entryEncoding)
        Deprecated.
        1.10 use #ArchiveStreamFactory(String) to specify the encoding
        Sets the encoding to use for arj, jar, ZIP, dump, cpio and tar files. Use null for the archiver default.
        Parameters:
        entryEncoding - the entry encoding, null uses the archiver default.
        Since:
        1.5