Class FrictionlessUnpackHandler
java.lang.Object
org.apache.tika.pipes.core.extractor.AbstractUnpackHandler
org.apache.tika.pipes.core.extractor.FrictionlessUnpackHandler
- All Implemented Interfaces:
Closeable,AutoCloseable,UnpackHandler
An UnpackHandler that collects embedded files for Frictionless Data Package output.
Files are stored in a temporary directory under an "unpacked/" subdirectory.
SHA256 hashes are computed during the add() operation using DigestInputStream.
After parsing completes, buildDataPackage() creates the manifest.
Output structure:
temp-dir/
└── unpacked/
├── 00000001.pdf
├── 00000002.png
└── ...
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordInformation about an embedded file including its SHA256 hash. -
Constructor Summary
ConstructorsConstructorDescriptionFrictionlessUnpackHandler(EmitKey containerEmitKey, UnpackConfig unpackConfig) Creates a new FrictionlessUnpackHandler. -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(int id, Metadata metadata, InputStream inputStream) buildDataPackage(String containerName) Builds the DataPackage manifest from collected files.voidclose()Returns the container emit key.Returns information about all embedded files.Returns the name of the original document if stored.Returns the path to the original document if stored.Returns the temporary directory where files are stored.Returns the UnpackConfig used by this handler.Returns the unpacked subdirectory where embedded files are stored.booleanReturns true if there are any embedded files.booleanReturns true if the original document was stored.voidstoreOriginalDocument(InputStream inputStream, String fileName) Stores the original container document for optional inclusion.Methods inherited from class org.apache.tika.pipes.core.extractor.AbstractUnpackHandler
getEmitKey, getIds
-
Constructor Details
-
FrictionlessUnpackHandler
public FrictionlessUnpackHandler(EmitKey containerEmitKey, UnpackConfig unpackConfig) throws IOException Creates a new FrictionlessUnpackHandler.- Parameters:
containerEmitKey- the emit key for the container documentunpackConfig- the unpack configuration- Throws:
IOException- if temp directory creation fails
-
-
Method Details
-
add
- Specified by:
addin interfaceUnpackHandler- Overrides:
addin classAbstractUnpackHandler- Throws:
IOException
-
storeOriginalDocument
Stores the original container document for optional inclusion.- Parameters:
inputStream- the original document input streamfileName- the file name for the original document- Throws:
IOException- if storing fails
-
buildDataPackage
Builds the DataPackage manifest from collected files.- Parameters:
containerName- the name of the container document- Returns:
- the built DataPackage
-
getTempDirectory
Returns the temporary directory where files are stored. -
getUnpackedDirectory
Returns the unpacked subdirectory where embedded files are stored. -
getEmbeddedFiles
Returns information about all embedded files. -
hasEmbeddedFiles
public boolean hasEmbeddedFiles()Returns true if there are any embedded files. -
getOriginalDocumentPath
Returns the path to the original document if stored. -
getOriginalDocumentName
Returns the name of the original document if stored. -
hasOriginalDocument
public boolean hasOriginalDocument()Returns true if the original document was stored. -
getUnpackConfig
Returns the UnpackConfig used by this handler. -
getContainerEmitKey
Returns the container emit key. -
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-