Package org.apache.tika.zip.utils
Class ZipSalvager
java.lang.Object
org.apache.tika.zip.utils.ZipSalvager
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidsalvageCopy(Path brokenZip, Path salvagedZip) Streams a broken zip from a Path and rebuilds a valid zip file.static voidsalvageCopy(TikaInputStream tis, Path salvagedZip, boolean allowStoredEntries) Streams the broken zip and rebuilds a new zip that is at least a valid zip file.static org.apache.commons.compress.archivers.zip.ZipFiletryToOpenZipFile(TikaInputStream tis, Metadata metadata) Tries to open a ZipFile from the TikaInputStream using default charset.static org.apache.commons.compress.archivers.zip.ZipFiletryToOpenZipFile(TikaInputStream tis, Metadata metadata, Charset charset) Tries to open a ZipFile from the TikaInputStream.
-
Constructor Details
-
ZipSalvager
public ZipSalvager()
-
-
Method Details
-
tryToOpenZipFile
public static org.apache.commons.compress.archivers.zip.ZipFile tryToOpenZipFile(TikaInputStream tis, Metadata metadata, Charset charset) Tries to open a ZipFile from the TikaInputStream. If direct opening fails, attempts to salvage the ZIP and open the salvaged version.On success:
- Sets
Zip.DETECTOR_ZIPFILE_OPENEDto true in metadata - Stores the ZipFile in tis.openContainer (if not already set)
- Returns the opened ZipFile
- Sets
Zip.DETECTOR_ZIPFILE_OPENEDto false in metadata - Returns null
- Parameters:
tis- the TikaInputStream (must be file-backed)metadata- the metadata to update with hintscharset- optional charset for entry names (may be null)- Returns:
- the opened ZipFile, or null if opening and salvaging both failed
- Sets
-
tryToOpenZipFile
public static org.apache.commons.compress.archivers.zip.ZipFile tryToOpenZipFile(TikaInputStream tis, Metadata metadata) Tries to open a ZipFile from the TikaInputStream using default charset. -
salvageCopy
public static void salvageCopy(TikaInputStream tis, Path salvagedZip, boolean allowStoredEntries) throws IOException Streams the broken zip and rebuilds a new zip that is at least a valid zip file. The contents of the final stream may be truncated, but the result should be a valid zip file.This does nothing fancy to fix the underlying broken zip.
This method does NOT close the TikaInputStream - the caller owns it. The caller should call
tis.enableRewind()before calling this method if retry on DATA_DESCRIPTOR is needed.- Parameters:
tis- the TikaInputStream to read from (not closed by this method)salvagedZip- the output path for the salvaged ZIPallowStoredEntries- whether to allow stored entries with data descriptors- Throws:
IOException- if salvaging fails
-
salvageCopy
Streams a broken zip from a Path and rebuilds a valid zip file.This is a convenience method that creates a TikaInputStream internally.
- Parameters:
brokenZip- the path to the broken ZIP filesalvagedZip- the path for the salvaged ZIP output- Throws:
IOException- if salvaging fails
-