Package org.apache.tika.config.loader
Class TikaJsonConfig
java.lang.Object
org.apache.tika.config.loader.TikaJsonConfig
Parsed representation of a Tika JSON configuration file.
Provides access to component configurations by type (parsers, detectors, etc.).
This class serves as the single source of truth for JSON parsing across core Tika (parsers, detectors) and tika-pipes (fetchers, emitters) components. It performs no validation - consumers validate only their own keys.
Unified Configuration Usage:
// Parse config once
TikaJsonConfig jsonConfig = TikaJsonConfig.load(Paths.get("config.json"));
// Load core Tika components (same classloader)
TikaLoader tikaLoader = TikaLoader.load(jsonConfig);
Parser parser = tikaLoader.loadParsers();
Detector detector = tikaLoader.loadDetectors();
// Load pipes/plugin components (different classloader)
TikaPluginManager pluginManager = TikaPluginManager.load(jsonConfig);
pluginManager.loadPlugins();
pluginManager.startPlugins();
// Extract config for plugins (crosses classloader boundary as string)
JsonNode fetchersNode = jsonConfig.getRootNode().get("fetchers");
if (fetchersNode != null) {
String fetcherConfigJson = fetchersNode.toString();
// Pass string to plugin - safe across classloader boundary
}
JSON structure:
{
// Core Tika components (validated by TikaLoader)
"parsers": [
{ "pdf-parser": { "_mime-include": ["application/pdf"], "ocrStrategy": "AUTO", ... } },
{ "html-parser": { ... } },
{ "default-parser": { "exclude": ["some-parser"] } }
{ "pdf-parser": { "_mime-include": ["application/pdf"], "ocrStrategy": "AUTO" } },
"html-parser", // String shorthand for no-config components
{ "default-parser": { "exclude": ["ocr-parser"] } }
],
"detectors": [
"poifs-container-detector", // String shorthand
{ "default-detector": { "spoolTypes": ["application/zip", "application/pdf"] } }
],
// Pipes components (validated by validateKeys())
"plugin-roots": ["/path/to/plugins"],
"fetchers": [...],
"emitters": [...]
}
All components use array format for explicit ordering. Parsers support decoration via "_mime-include" and "_mime-exclude" fields. Components without configuration can use string shorthand: "component-name" instead of { "component-name": {} }. Parsers support mime filtering via "_mime-include" and "_mime-exclude" fields. Special "default-parser" entry enables SPI fallback for unlisted parsers.
-
Method Summary
Modifier and TypeMethodDescription<T> Tdeserialize(String key, Class<T> clazz) Deserializes a configuration value for the given key.getArrayComponents(String componentType) Gets component configurations for a specific type (array format - used for detectors, etc.).getComponents(String componentType) Gets component configurations for a specific type (object format - used for parsers).com.fasterxml.jackson.databind.JsonNodeGets the raw root JSON node.booleanhasArrayComponents(String componentType) Checks if a component type has any configured components (array format).booleanhasComponents(String componentType) Checks if a component type has any configured components (object format).booleanhasComponentSection(String componentType) Checks if a component type section exists in the config (even if empty).booleanChecks if a configuration key exists.static TikaJsonConfigload(InputStream inputStream) Loads configuration from an input stream.static TikaJsonConfigLoads configuration from a file.static TikaJsonConfigCreates an empty configuration (no config file).toString()
-
Method Details
-
load
Loads configuration from a file.- Parameters:
configPath- the path to the JSON configuration file- Returns:
- the parsed configuration
- Throws:
TikaConfigException- if loading or parsing fails
-
load
Loads configuration from an input stream.- Parameters:
inputStream- the input stream containing JSON configuration- Returns:
- the parsed configuration
- Throws:
TikaConfigException- if loading or parsing fails
-
loadDefault
Creates an empty configuration (no config file). All components will be loaded from SPI.- Returns:
- an empty configuration
-
getComponents
Gets component configurations for a specific type (object format - used for parsers).- Parameters:
componentType- the component type (e.g., "parsers")- Returns:
- map of component name to configuration JSON, or empty map if type not found
-
getArrayComponents
public List<Map.Entry<String,com.fasterxml.jackson.databind.JsonNode>> getArrayComponents(String componentType) Gets component configurations for a specific type (array format - used for detectors, etc.).- Parameters:
componentType- the component type (e.g., "detectors")- Returns:
- ordered list of (name, config) entries, or empty list if type not found
-
hasComponents
Checks if a component type has any configured components (object format).- Parameters:
componentType- the component type- Returns:
- true if the type has configurations
-
hasArrayComponents
Checks if a component type has any configured components (array format).- Parameters:
componentType- the component type- Returns:
- true if the type has configurations
-
hasComponentSection
Checks if a component type section exists in the config (even if empty).- Parameters:
componentType- the component type- Returns:
- true if the section exists
-
getRootNode
public com.fasterxml.jackson.databind.JsonNode getRootNode()Gets the raw root JSON node.- Returns:
- the root node
-
deserialize
Deserializes a configuration value for the given key.- Type Parameters:
T- the type to deserialize to- Parameters:
key- the configuration keyclazz- the target class- Returns:
- the deserialized value, or null if key doesn't exist
- Throws:
IOException- if deserialization fails
-
hasKey
Checks if a configuration key exists.- Parameters:
key- the configuration key- Returns:
- true if the key exists and is not null
-
toString
-