Class ParseContext

java.lang.Object
org.apache.tika.parser.ParseContext
All Implemented Interfaces:
Serializable

public class ParseContext extends Object implements Serializable
Parse context. Used to pass context information to Tika parsers.
Since:
Apache Tika 0.5
See Also:
  • Constructor Details

    • ParseContext

      public ParseContext()
  • Method Details

    • set

      public <T> void set(Class<T> key, T value)
      Adds the given value to the context as an implementation of the given interface.
      Parameters:
      key - the interface implemented by the given value
      value - the value to be added, or null to remove
    • get

      public <T> T get(Class<T> key)
      Returns the object in this context that implements the given interface.
      Parameters:
      key - the interface implemented by the requested object
      Returns:
      the object that implements the given interface, or null if not found
    • get

      public <T> T get(Class<T> key, T defaultValue)
      Returns the object in this context that implements the given interface, or the given default value if such an object is not found.
      Parameters:
      key - the interface implemented by the requested object
      defaultValue - value to return if the requested object is not found
      Returns:
      the object that implements the given interface, or the given default value if not found
    • setJsonConfig

      public void setJsonConfig(String name, JsonConfig config)
      Sets a JSON configuration by component name.

      This stores the JSON config for later resolution. The JSON will be deserialized when requested via the component registry in tika-serialization.

      Example:

       parseContext.setJsonConfig("pdf-parser", () -> "{\"ocrStrategy\": \"AUTO\"}");
       parseContext.setJsonConfig("handler-config", () -> "{\"type\": \"XML\"}");
       
      Parameters:
      name - the component name (e.g., "pdf-parser", "handler-config")
      config - the JSON configuration
      Since:
      Apache Tika 4.0
    • setJsonConfig

      public void setJsonConfig(String name, String json)
      Sets a JSON configuration by component name using a raw JSON string.

      Convenience method that wraps the string in a JsonConfig.

      Parameters:
      name - the component name (e.g., "pdf-parser", "handler-config")
      json - the JSON configuration string
      Since:
      Apache Tika 4.0
    • getJsonConfig

      public JsonConfig getJsonConfig(String name)
      Gets a JSON configuration by component name.
      Parameters:
      name - the component name
      Returns:
      the JsonConfig, or null if not found
      Since:
      Apache Tika 4.0
    • getJsonConfigs

      public Map<String,JsonConfig> getJsonConfigs()
      Returns all JSON configurations for serialization.
      Returns:
      unmodifiable map of component name to JsonConfig
      Since:
      Apache Tika 4.0
    • getResolvedConfig

      public <T> T getResolvedConfig(String name)
      Gets a resolved configuration object from the cache.

      This is used by tika-serialization after deserializing a JSON config. The resolved object is cached here to avoid repeated deserialization.

      Parameters:
      name - the component name
      Returns:
      the resolved object, or null if not cached
      Since:
      Apache Tika 4.0
    • setResolvedConfig

      public void setResolvedConfig(String name, Object config)
      Caches a resolved configuration object.

      Called by tika-serialization after deserializing a JSON config.

      Parameters:
      name - the component name
      config - the resolved configuration object
      Since:
      Apache Tika 4.0
    • hasJsonConfig

      public boolean hasJsonConfig(String name)
      Checks if a JSON configuration exists for the given component name.
      Parameters:
      name - the component name
      Returns:
      true if a JSON config exists
      Since:
      Apache Tika 4.0
    • isEmpty

      public boolean isEmpty()
    • copyFrom

      public void copyFrom(ParseContext source)
      Copies all entries from the source ParseContext into this one. Existing entries in this context are overwritten by source entries.

      This copies both typed objects (from context map) and JSON configs.

      Parameters:
      source - the ParseContext to copy from
      Since:
      Apache Tika 4.0
    • newMetadata

      public Metadata newMetadata()
      Creates a new Metadata object with any configured limits applied.

      If a MetadataWriteLimiterFactory is configured in this ParseContext, the returned Metadata will have a write limiter that enforces those limits. Otherwise, returns a plain Metadata object.

      Parsers should use this method instead of new Metadata() when creating metadata for embedded documents, to ensure limits are applied at creation time rather than later during parsing.

      Example usage:

       Metadata embeddedMetadata = Metadata.newInstance(context);
       embeddedMetadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, name);
       // limits are already applied, no data bypasses the limiter
       
      Returns:
      a new Metadata object, with limits applied if configured
      Since:
      Apache Tika 4.0
      See Also:
    • getContextMap

      public Map<String,Object> getContextMap()
      Returns the internal context map for serialization purposes. The returned map is unmodifiable.

      This method is intended for use by serialization frameworks only. Keys are fully-qualified class names, values are the objects stored in the context.

      Returns:
      an unmodifiable view of the context map
      Since:
      Apache Tika 4.0
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object