Package org.apache.tika.sax
Class BasicContentHandlerFactory
java.lang.Object
org.apache.tika.sax.BasicContentHandlerFactory
- All Implemented Interfaces:
Serializable,ContentHandlerFactory,StreamingContentHandlerFactory,WriteLimiter
public class BasicContentHandlerFactory
extends Object
implements StreamingContentHandlerFactory, WriteLimiter
Basic factory for creating common types of ContentHandlers.
Implements StreamingContentHandlerFactory to support both in-memory
content extraction and streaming output to an OutputStream.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumCommon handler types for content. -
Constructor Summary
ConstructorsConstructorDescriptionNo-arg constructor for bean-style configuration (e.g., Jackson deserialization).BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE type, int writeLimit) Create a BasicContentHandlerFactory withthrowOnWriteLimitReachedis trueBasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE type, int writeLimit, boolean throwOnWriteLimitReached, ParseContext parseContext) -
Method Summary
Modifier and TypeMethodDescriptionCreates a new ContentHandler for extracting content.createHandler(OutputStream os, Charset charset) Creates a new ContentHandler that writes output directly to the given OutputStream.booleangetType()intReturns the name of the handler type produced by this factory (e.g.inthashCode()booleanstatic BasicContentHandlerFactorynewInstance(BasicContentHandlerFactory.HANDLER_TYPE type, ParseContext context) Creates a new BasicContentHandlerFactory configured from OutputLimits in the ParseContext.parseHandlerType(String handlerTypeName, BasicContentHandlerFactory.HANDLER_TYPE defaultType) Tries to parse string into handler type.voidsetParseContext(ParseContext parseContext) Sets the parse context for storing warnings when throwOnWriteLimitReached is false.voidsetThrowOnWriteLimitReached(boolean throwOnWriteLimitReached) Sets whether to throw an exception when write limit is reached.voidSets the handler type.voidsetWriteLimit(int writeLimit) Sets the write limit.
-
Constructor Details
-
BasicContentHandlerFactory
public BasicContentHandlerFactory()No-arg constructor for bean-style configuration (e.g., Jackson deserialization). Creates a factory with TEXT handler type, unlimited write, and throwOnWriteLimitReached=true. -
BasicContentHandlerFactory
Create a BasicContentHandlerFactory withthrowOnWriteLimitReachedis true- Parameters:
type- basic type of handlerwriteLimit- max number of characters to store; if < 0, the handler will store all characters
-
BasicContentHandlerFactory
public BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE type, int writeLimit, boolean throwOnWriteLimitReached, ParseContext parseContext) - Parameters:
type- basic type of handlerwriteLimit- maximum number of characters to storethrowOnWriteLimitReached- whether or not to throw aWriteLimitReachedExceptionwhen the write limit has been reachedparseContext- to store the writelimitreached warning if throwOnWriteLimitReached is set tofalse
-
-
Method Details
-
newInstance
public static BasicContentHandlerFactory newInstance(BasicContentHandlerFactory.HANDLER_TYPE type, ParseContext context) Creates a new BasicContentHandlerFactory configured from OutputLimits in the ParseContext.If OutputLimits is present in the context, the factory will be configured with those limits (writeLimit, throwOnWriteLimit). Otherwise, default values are used.
- Parameters:
type- the handler typecontext- the ParseContext (required if throwOnWriteLimit is false)- Returns:
- a configured BasicContentHandlerFactory
-
parseHandlerType
public static BasicContentHandlerFactory.HANDLER_TYPE parseHandlerType(String handlerTypeName, BasicContentHandlerFactory.HANDLER_TYPE defaultType) Tries to parse string into handler type. Returns default if string is null or parse fails. Options: xml, html, text, body, ignore (no content), markdown/md- Parameters:
handlerTypeName- string to parsedefaultType- type to return if parse fails- Returns:
- handler type
-
createHandler
Description copied from interface:ContentHandlerFactoryCreates a new ContentHandler for extracting content.- Specified by:
createHandlerin interfaceContentHandlerFactory- Returns:
- a new ContentHandler instance
-
createHandler
Description copied from interface:StreamingContentHandlerFactoryCreates a new ContentHandler that writes output directly to the given OutputStream.- Specified by:
createHandlerin interfaceStreamingContentHandlerFactory- Parameters:
os- the output stream to write tocharset- the character encoding to use- Returns:
- a new ContentHandler instance that writes to the stream
-
getType
- Returns:
- handler type used by this factory
-
handlerTypeName
Description copied from interface:ContentHandlerFactoryReturns the name of the handler type produced by this factory (e.g.TEXT,MARKDOWN,HTML,XML).This value is written to
TikaCoreProperties.TIKA_CONTENT_HANDLER_TYPEso that downstream components (such as the inference pipeline) can determine what formattika:contentis in without guessing.- Specified by:
handlerTypeNamein interfaceContentHandlerFactory- Returns:
- handler type name, never
null
-
setType
Sets the handler type.- Parameters:
type- the handler type
-
getWriteLimit
public int getWriteLimit()- Specified by:
getWriteLimitin interfaceWriteLimiter
-
setWriteLimit
public void setWriteLimit(int writeLimit) Sets the write limit.- Parameters:
writeLimit- max characters to extract; -1 for unlimited
-
isThrowOnWriteLimitReached
public boolean isThrowOnWriteLimitReached()- Specified by:
isThrowOnWriteLimitReachedin interfaceWriteLimiter
-
setThrowOnWriteLimitReached
public void setThrowOnWriteLimitReached(boolean throwOnWriteLimitReached) Sets whether to throw an exception when write limit is reached.- Parameters:
throwOnWriteLimitReached- true to throw, false to silently stop
-
setParseContext
Sets the parse context for storing warnings when throwOnWriteLimitReached is false.- Parameters:
parseContext- the parse context
-
equals
-
hashCode
public int hashCode()
-