Interface ContentHandlerFactory

All Superinterfaces:
Serializable
All Known Subinterfaces:
StreamingContentHandlerFactory
All Known Implementing Classes:
BasicContentHandlerFactory, PickBestTextEncodingParser.CharsetContentHandlerFactory

public interface ContentHandlerFactory extends Serializable
Factory interface for creating ContentHandler instances.

This is the base interface used by tika-pipes, RecursiveParserWrapper, and other components that need to create content handlers for in-memory content extraction.

For streaming output to an OutputStream, see StreamingContentHandlerFactory.

See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    Creates a new ContentHandler for extracting content.
    default String
    Returns the name of the handler type produced by this factory (e.g.
  • Method Details

    • createHandler

      ContentHandler createHandler()
      Creates a new ContentHandler for extracting content.
      Returns:
      a new ContentHandler instance
    • handlerTypeName

      default String handlerTypeName()
      Returns the name of the handler type produced by this factory (e.g. TEXT, MARKDOWN, HTML, XML).

      This value is written to TikaCoreProperties.TIKA_CONTENT_HANDLER_TYPE so that downstream components (such as the inference pipeline) can determine what format tika:content is in without guessing.

      Returns:
      handler type name, never null