Class ZipSalvager

java.lang.Object
org.apache.tika.zip.utils.ZipSalvager

public class ZipSalvager extends Object
  • Constructor Details

    • ZipSalvager

      public ZipSalvager()
  • Method Details

    • tryToOpenZipFile

      public static org.apache.commons.compress.archivers.zip.ZipFile tryToOpenZipFile(TikaInputStream tis, Metadata metadata, Charset charset)
      Tries to open a ZipFile from the TikaInputStream. If direct opening fails, attempts to salvage the ZIP and open the salvaged version.

      On success:

      • Sets Zip.DETECTOR_ZIPFILE_OPENED to true in metadata
      • Stores the ZipFile in tis.openContainer (if not already set)
      • Returns the opened ZipFile
      On failure:
      Parameters:
      tis - the TikaInputStream (must be file-backed)
      metadata - the metadata to update with hints
      charset - optional charset for entry names (may be null)
      Returns:
      the opened ZipFile, or null if opening and salvaging both failed
    • tryToOpenZipFile

      public static org.apache.commons.compress.archivers.zip.ZipFile tryToOpenZipFile(TikaInputStream tis, Metadata metadata)
      Tries to open a ZipFile from the TikaInputStream using default charset.
      See Also:
    • salvageCopy

      public static void salvageCopy(TikaInputStream tis, Path salvagedZip, boolean allowStoredEntries) throws IOException
      Streams the broken zip and rebuilds a new zip that is at least a valid zip file. The contents of the final stream may be truncated, but the result should be a valid zip file.

      This does nothing fancy to fix the underlying broken zip.

      This method does NOT close the TikaInputStream - the caller owns it. The caller should call tis.enableRewind() before calling this method if retry on DATA_DESCRIPTOR is needed.

      Parameters:
      tis - the TikaInputStream to read from (not closed by this method)
      salvagedZip - the output path for the salvaged ZIP
      allowStoredEntries - whether to allow stored entries with data descriptors
      Throws:
      IOException - if salvaging fails
    • salvageCopy

      public static void salvageCopy(Path brokenZip, Path salvagedZip) throws IOException
      Streams a broken zip from a Path and rebuilds a valid zip file.

      This is a convenience method that creates a TikaInputStream internally.

      Parameters:
      brokenZip - the path to the broken ZIP file
      salvagedZip - the path for the salvaged ZIP output
      Throws:
      IOException - if salvaging fails