Annotation Interface TikaComponent


@Retention(CLASS) @Target(TYPE) public @interface TikaComponent
Annotation for Tika components (parsers, detectors, etc.) that enables:
  • Automatic SPI file generation (META-INF/services/...)
  • Name-based component registry for JSON configuration

The annotation processor generates:

  • Standard Java SPI files for ServiceLoader
  • Component index files (META-INF/tika/{type}.idx) for name-based lookup

This annotation is processed at compile time by the annotation processor. The contextKey is recorded in the .idx file for runtime resolution.

Example usage:

 @TikaComponent
 public class PDFParser extends AbstractParser {
     // auto-generates name "pdf-parser", included in SPI
 }

 @TikaComponent(name = "tesseract-ocr")
 public class TesseractOCRParser extends AbstractParser {
     // explicit name override, included in SPI
 }

 @TikaComponent(spi = false)
 public class DWGReadParser extends AbstractParser {
     // available by name, but NOT auto-loaded by default-parser
 }

 @TikaComponent(contextKey = MetadataFilter.class)
 public class MyFilter implements MetadataFilter, AnotherInterface {
     // explicit ParseContext key when class implements multiple known interfaces
 }

 @TikaComponent(defaultFor = ContentHandlerFactory.class)
 public class BasicContentHandlerFactory implements ContentHandlerFactory {
     // marks this as the default implementation for ContentHandlerFactory
 }
 
Since:
3.1.0
  • Optional Element Summary

    Optional Elements
    Modifier and Type
    Optional Element
    Description
    The class to use as the key when adding this component to ParseContext.
    Marks this component as the default implementation for the specified interface.
    The component name used in JSON configuration.
    boolean
    Whether this component should be included in SPI files for automatic discovery via ServiceLoader.
  • Element Details

    • name

      String name
      The component name used in JSON configuration. If empty, the name is automatically generated from the class name using kebab-case conversion (e.g., PDFParser becomes "pdf-parser").
      Returns:
      the component name, or empty string for auto-generation
      Default:
      ""
    • spi

      boolean spi
      Whether this component should be included in SPI files for automatic discovery via ServiceLoader. When false, the component is only available via explicit configuration (not loaded by "default-parser").

      Use spi = false for opt-in components that users must explicitly enable in their configuration.

      Returns:
      true to include in SPI (default), false to require explicit config
      Default:
      true
    • contextKey

      Class<?> contextKey
      The class to use as the key when adding this component to ParseContext.

      By default (void.class), the key is auto-detected:

      • If the component implements a known interface (e.g., MetadataFilter), that interface is used as the key
      • Otherwise, the component's own class is used as the key

      Use this attribute to explicitly specify the key when:

      • The component implements multiple known interfaces (ambiguous)
      • You need a specific interface/class that isn't auto-detected
      Returns:
      the class to use as ParseContext key, or void.class for auto-detection
      Default:
      void.class
    • defaultFor

      Class<?> defaultFor
      Marks this component as the default implementation for the specified interface.

      When set, this component will be used as the default when loading a ParseContext with defaults (via loadParseContextWithDefaults()) and no explicit configuration is provided for the interface.

      The specified class should be an interface that this component implements. For example:

       @TikaComponent(defaultFor = ContentHandlerFactory.class)
       public class BasicContentHandlerFactory implements ContentHandlerFactory {
           // This will be instantiated by default when no ContentHandlerFactory is configured
       }
       
      Returns:
      the interface this component is the default for, or void.class if not a default
      Default:
      void.class