Class PipesParsingHelper
The helper manages a dedicated temp directory for input files. A file-system-fetcher is configured with basePath pointing to this directory, ensuring child processes can only access files within the designated temp directory (no absolute paths).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordResult of UNPACK parsing containing the zip file path and metadata. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionPipesParsingHelper(PipesParser pipesParser, PipesConfig pipesConfig, Path inputTempDirectory, Path unpackEmitterBasePath) Creates a PipesParsingHelper. -
Method Summary
Modifier and TypeMethodDescriptionGets the input temp directory path.Gets the PipesConfig instance.Gets the PipesParser instance.static jakarta.ws.rs.core.Response.StatusMaps PipesResult status to HTTP response status.parse(TikaInputStream tis, Metadata metadata, ParseContext parseContext, ParseMode parseMode) Parses content using pipes-based parsing with process isolation.parseUnpack(TikaInputStream tis, Metadata metadata, ParseContext parseContext, boolean saveAll) Parses content using UNPACK mode and returns a path to the zip file containing extracted embedded documents.
-
Field Details
-
DEFAULT_FETCHER_ID
The fetcher ID used for reading temp files. This fetcher is configured with basePath = inputTempDirectory.- See Also:
-
UNPACK_EMITTER_ID
Name of the file-system emitter used for UNPACK mode. This emitter must be configured in tika-config.json with a basePath pointing to a writable temp directory.- See Also:
-
-
Constructor Details
-
PipesParsingHelper
public PipesParsingHelper(PipesParser pipesParser, PipesConfig pipesConfig, Path inputTempDirectory, Path unpackEmitterBasePath) Creates a PipesParsingHelper.- Parameters:
pipesParser- the PipesParser instancepipesConfig- the PipesConfig instanceinputTempDirectory- the temp directory for input files. The file-system-fetcher is configured with basePath = this directory.unpackEmitterBasePath- the basePath where the unpack-emitter writes files. This is where the server will find the zip files created by UNPACK mode. May be null if UNPACK mode won't be used.
-
-
Method Details
-
getInputTempDirectory
Gets the input temp directory path.- Returns:
- the input temp directory
-
parse
public List<Metadata> parse(TikaInputStream tis, Metadata metadata, ParseContext parseContext, ParseMode parseMode) throws IOException Parses content using pipes-based parsing with process isolation.This method spools the input to the dedicated temp directory and uses a relative filename in the FetchKey. The file-system-fetcher is configured with basePath pointing to this directory, so the child process can only access files there.
The caller is responsible for closing the TikaInputStream.
- Parameters:
tis- the TikaInputStream containing the content to parsemetadata- metadata to pass to the parser (may include filename, content-type, etc.)parseContext- parse context with handler configurationparseMode- the parse mode (RMETA or CONCATENATE)- Returns:
- list of metadata objects from parsing
- Throws:
IOException- if temp file operations failTikaServerParseException- if parsing fails
-
mapStatusToHttpResponse
public static jakarta.ws.rs.core.Response.Status mapStatusToHttpResponse(PipesResult.RESULT_STATUS status) Maps PipesResult status to HTTP response status. -
getPipesParser
Gets the PipesParser instance. -
getPipesConfig
Gets the PipesConfig instance. -
parseUnpack
public PipesParsingHelper.UnpackResult parseUnpack(TikaInputStream tis, Metadata metadata, ParseContext parseContext, boolean saveAll) throws IOException Parses content using UNPACK mode and returns a path to the zip file containing extracted embedded documents.This method: 1. Spools input to the dedicated temp directory 2. Configures UnpackConfig with zipEmbeddedFiles=true 3. The pipes child process extracts embedded files and creates a zip 4. The zip is emitted to the configured file-system emitter 5. Returns the path to the zip file for streaming
The caller is responsible for deleting the zip file after streaming.
- Parameters:
tis- the TikaInputStream containing the content to parsemetadata- metadata to pass to the parserparseContext- parse context (may contain UnpackConfig, UnpackSelector, EmbeddedLimits)saveAll- if true, includes container text and metadata in the zip- Returns:
- UnpackResult containing path to zip file and metadata list
- Throws:
IOException- if parsing or file operations fail
-