Package org.apache.tika.config
Annotation Interface TikaComponent
Annotation for Tika components (parsers, detectors, etc.) that enables:
- Automatic SPI file generation (META-INF/services/...)
- Name-based component registry for JSON configuration
The annotation processor generates:
- Standard Java SPI files for ServiceLoader
- Component index files (META-INF/tika/{type}.idx) for name-based lookup
This annotation is processed at compile time by the annotation processor. The contextKey is recorded in the .idx file for runtime resolution.
Example usage:
@TikaComponentpublic class PDFParser extends AbstractParser { // auto-generates name "pdf-parser", included in SPI }@TikaComponent(name = "tesseract-ocr")public class TesseractOCRParser extends AbstractParser { // explicit name override, included in SPI }@TikaComponent(spi = false)public class DWGReadParser extends AbstractParser { // available by name, but NOT auto-loaded by default-parser }@TikaComponent(contextKey = MetadataFilter.class)public class MyFilter implements MetadataFilter, AnotherInterface { // explicit ParseContext key when class implements multiple known interfaces }@TikaComponent(defaultFor = ContentHandlerFactory.class)public class BasicContentHandlerFactory implements ContentHandlerFactory { // marks this as the default implementation for ContentHandlerFactory }
- Since:
- 3.1.0
-
Optional Element Summary
Optional ElementsModifier and TypeOptional ElementDescriptionClass<?>The class to use as the key when adding this component to ParseContext.Class<?>Marks this component as the default implementation for the specified interface.The component name used in JSON configuration.booleanWhether this component should be included in SPI files for automatic discovery via ServiceLoader.
-
Element Details
-
name
String nameThe component name used in JSON configuration. If empty, the name is automatically generated from the class name using kebab-case conversion (e.g., PDFParser becomes "pdf-parser").- Returns:
- the component name, or empty string for auto-generation
- Default:
- ""
-
spi
boolean spiWhether this component should be included in SPI files for automatic discovery via ServiceLoader. When false, the component is only available via explicit configuration (not loaded by "default-parser").Use
spi = falsefor opt-in components that users must explicitly enable in their configuration.- Returns:
- true to include in SPI (default), false to require explicit config
- Default:
- true
-
contextKey
Class<?> contextKeyThe class to use as the key when adding this component to ParseContext.By default (
void.class), the key is auto-detected:- If the component implements a known interface (e.g., MetadataFilter), that interface is used as the key
- Otherwise, the component's own class is used as the key
Use this attribute to explicitly specify the key when:
- The component implements multiple known interfaces (ambiguous)
- You need a specific interface/class that isn't auto-detected
- Returns:
- the class to use as ParseContext key, or void.class for auto-detection
- Default:
- void.class
-
defaultFor
Class<?> defaultForMarks this component as the default implementation for the specified interface.When set, this component will be used as the default when loading a ParseContext with defaults (via
loadParseContextWithDefaults()) and no explicit configuration is provided for the interface.The specified class should be an interface that this component implements. For example:
@TikaComponent(defaultFor = ContentHandlerFactory.class)public class BasicContentHandlerFactory implements ContentHandlerFactory { // This will be instantiated by default when no ContentHandlerFactory is configured }- Returns:
- the interface this component is the default for, or void.class if not a default
- Default:
- void.class
-