Class TikaJsonConfig

java.lang.Object
org.apache.tika.config.loader.TikaJsonConfig

public class TikaJsonConfig extends Object
Parsed representation of a Tika JSON configuration file. Provides access to component configurations by type (parsers, detectors, etc.).

This class serves as the single source of truth for JSON parsing across core Tika (parsers, detectors) and tika-pipes (fetchers, emitters) components. It performs no validation - consumers validate only their own keys.

Unified Configuration Usage:

 // Parse config once
 TikaJsonConfig jsonConfig = TikaJsonConfig.load(Paths.get("config.json"));

 // Load core Tika components (same classloader)
 TikaLoader tikaLoader = TikaLoader.load(jsonConfig);
 Parser parser = tikaLoader.loadParsers();
 Detector detector = tikaLoader.loadDetectors();

 // Load pipes/plugin components (different classloader)
 TikaPluginManager pluginManager = TikaPluginManager.load(jsonConfig);
 pluginManager.loadPlugins();
 pluginManager.startPlugins();

 // Extract config for plugins (crosses classloader boundary as string)
 JsonNode fetchersNode = jsonConfig.getRootNode().get("fetchers");
 if (fetchersNode != null) {
     String fetcherConfigJson = fetchersNode.toString();
     // Pass string to plugin - safe across classloader boundary
 }
 

JSON structure:

 {
   // Core Tika components (validated by TikaLoader)
   "parsers": [
     { "pdf-parser": { "_mime-include": ["application/pdf"], "ocrStrategy": "AUTO", ... } },
     { "html-parser": { ... } },
     { "default-parser": { "exclude": ["some-parser"] } }
     { "pdf-parser": { "_mime-include": ["application/pdf"], "ocrStrategy": "AUTO" } },
     "html-parser",                    // String shorthand for no-config components
     { "default-parser": { "exclude": ["ocr-parser"] } }
   ],
   "detectors": [
     "poifs-container-detector",       // String shorthand
     { "default-detector": { "spoolTypes": ["application/zip", "application/pdf"] } }
   ],

   // Pipes components (validated by validateKeys())
   "plugin-roots": ["/path/to/plugins"],
   "fetchers": [...],
   "emitters": [...]
 }
 

All components use array format for explicit ordering. Parsers support decoration via "_mime-include" and "_mime-exclude" fields. Components without configuration can use string shorthand: "component-name" instead of { "component-name": {} }. Parsers support mime filtering via "_mime-include" and "_mime-exclude" fields. Special "default-parser" entry enables SPI fallback for unlisted parsers.

  • Method Details

    • load

      public static TikaJsonConfig load(Path configPath) throws TikaConfigException
      Loads configuration from a file.
      Parameters:
      configPath - the path to the JSON configuration file
      Returns:
      the parsed configuration
      Throws:
      TikaConfigException - if loading or parsing fails
    • load

      public static TikaJsonConfig load(InputStream inputStream) throws TikaConfigException
      Loads configuration from an input stream.
      Parameters:
      inputStream - the input stream containing JSON configuration
      Returns:
      the parsed configuration
      Throws:
      TikaConfigException - if loading or parsing fails
    • loadDefault

      public static TikaJsonConfig loadDefault()
      Creates an empty configuration (no config file). All components will be loaded from SPI.
      Returns:
      an empty configuration
    • getComponents

      public Map<String,com.fasterxml.jackson.databind.JsonNode> getComponents(String componentType)
      Gets component configurations for a specific type (object format - used for parsers).
      Parameters:
      componentType - the component type (e.g., "parsers")
      Returns:
      map of component name to configuration JSON, or empty map if type not found
    • getArrayComponents

      public List<Map.Entry<String,com.fasterxml.jackson.databind.JsonNode>> getArrayComponents(String componentType)
      Gets component configurations for a specific type (array format - used for detectors, etc.).
      Parameters:
      componentType - the component type (e.g., "detectors")
      Returns:
      ordered list of (name, config) entries, or empty list if type not found
    • hasComponents

      public boolean hasComponents(String componentType)
      Checks if a component type has any configured components (object format).
      Parameters:
      componentType - the component type
      Returns:
      true if the type has configurations
    • hasArrayComponents

      public boolean hasArrayComponents(String componentType)
      Checks if a component type has any configured components (array format).
      Parameters:
      componentType - the component type
      Returns:
      true if the type has configurations
    • hasComponentSection

      public boolean hasComponentSection(String componentType)
      Checks if a component type section exists in the config (even if empty).
      Parameters:
      componentType - the component type
      Returns:
      true if the section exists
    • getRootNode

      public com.fasterxml.jackson.databind.JsonNode getRootNode()
      Gets the raw root JSON node.
      Returns:
      the root node
    • deserialize

      public <T> T deserialize(String key, Class<T> clazz) throws IOException
      Deserializes a configuration value for the given key.
      Type Parameters:
      T - the type to deserialize to
      Parameters:
      key - the configuration key
      clazz - the target class
      Returns:
      the deserialized value, or null if key doesn't exist
      Throws:
      IOException - if deserialization fails
    • hasKey

      public boolean hasKey(String key)
      Checks if a configuration key exists.
      Parameters:
      key - the configuration key
      Returns:
      true if the key exists and is not null
    • toString

      public String toString()
      Overrides:
      toString in class Object