Modifier and Type | Method and Description |
---|---|
Parser |
Tika.getParser()
Returns the parser instance used by this facade.
|
Constructor and Description |
---|
Tika(Detector detector,
Parser parser)
Creates a Tika facade using the given detector and parser instances, but the default Translator.
|
Tika(Detector detector,
Parser parser,
Translator translator)
Creates a Tika facade using the given detector, parser, and translator instances.
|
Modifier and Type | Method and Description |
---|---|
Parser |
DigestingAutoDetectParserFactory.getParser(TikaConfig config) |
abstract Parser |
ParserFactory.getParser(TikaConfig config) |
Parser |
AutoDetectParserFactory.getParser(TikaConfig config) |
Modifier and Type | Method and Description |
---|---|
protected void |
FileResourceConsumer.parse(String resourceId,
Parser parser,
InputStream is,
ContentHandler handler,
Metadata m,
ParseContext parseContext)
Utility method to handle logging equivalently among all
implementing classes.
|
Constructor and Description |
---|
BasicTikaFSConsumer(ArrayBlockingQueue<FileResource> queue,
Parser parser,
ContentHandlerFactory contentHandlerFactory,
OutputStreamFactory fsOSFactory) |
RecursiveParserWrapperFSConsumer(ArrayBlockingQueue<FileResource> queue,
Parser parser,
ContentHandlerFactory contentHandlerFactory,
OutputStreamFactory fsOSFactory,
MetadataFilter metadataFilter) |
StreamOutRPWFSConsumer(ArrayBlockingQueue<FileResource> queue,
Parser parser,
ContentHandlerFactory contentHandlerFactory,
OutputStreamFactory fsOSFactory,
MetadataFilter metadataFilter) |
Modifier and Type | Method and Description |
---|---|
Parser |
TikaConfig.getParser()
Returns the configured parser instance.
|
Parser |
TikaConfig.getParser(MediaType mimeType)
Deprecated.
Use the
TikaConfig.getParser() method instead |
Modifier and Type | Class and Description |
---|---|
class |
DirListParser
Parses the output of /bin/ls and counts the number of files and the number of
executables using Tika.
|
class |
EncryptedPrescriptionParser |
class |
LanguageDetectingParser |
class |
PrescriptionParser |
Modifier and Type | Method and Description |
---|---|
static Parser |
EmbeddedDocumentUtil.tryToFindExistingLeafParser(Class clazz,
ParseContext context)
Tries to find an existing parser within the ParseContext.
|
Constructor and Description |
---|
ParserContainerExtractor(Parser parser,
Detector detector) |
Modifier and Type | Class and Description |
---|---|
class |
ForkParser |
Constructor and Description |
---|
ForkParser(ClassLoader loader,
Parser parser) |
Constructor and Description |
---|
TikaGUI(Parser parser) |
Modifier and Type | Class and Description |
---|---|
class |
AbstractEncodingDetectorParser
Abstract base class for parsers that use the AutoDetectReader and need
to use the
EncodingDetector configured by TikaConfig |
class |
AbstractExternalProcessParser
Abstract base class for parsers that call external processes.
|
class |
AbstractParser
Abstract base class for new parsers.
|
class |
AutoDetectParser |
class |
CompositeParser
Composite parser that delegates parsing tasks to a component parser
based on the declared content type of the incoming document.
|
class |
CryptoParser
Decrypts the incoming document stream and delegates further parsing to
another parser instance.
|
class |
DefaultParser
A composite parser based on all the
Parser implementations
available through the
service provider mechanism . |
class |
DelegatingParser
Base class for parser implementations that want to delegate parts of the
task of parsing an input document to another parser.
|
class |
DigestingParser |
class |
EmptyParser
Dummy parser that always produces an empty XHTML document without even
attempting to parse the given document stream.
|
class |
ErrorParser
Dummy parser that always throws a
TikaException without even
attempting to parse the given document stream. |
class |
NetworkParser |
class |
ParserDecorator
Decorator base class for the
Parser interface. |
class |
ParserPostProcessor
Parser decorator that post-processes the results from a decorated parser.
|
class |
RecursiveParserWrapper
This is a helper class that wraps a parser in a recursive handler.
|
Modifier and Type | Method and Description |
---|---|
abstract Parser |
ParserFactory.build() |
Parser |
AutoDetectParserFactory.build() |
protected Parser |
DelegatingParser.getDelegateParser(ParseContext context)
Returns the parser instance to which parsing tasks should be delegated.
|
Parser |
CompositeParser.getFallback()
Returns the fallback parser.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
Parser |
ParserDecorator.getWrappedParser()
Gets the parser wrapped by this ParserDecorator
|
static Parser |
ParserDecorator.withFallbacks(Collection<? extends Parser> parsers,
Set<MediaType> types)
Deprecated.
Do not use until the TODOs are resolved, see TIKA-1509
|
static Parser |
ParserDecorator.withoutTypes(Parser parser,
Set<MediaType> excludeTypes)
Decorates the given parser so that it never claims to support
parsing of the given media types, but will work for all others.
|
static Parser |
ParserDecorator.withTypes(Parser parser,
Set<MediaType> types)
Decorates the given parser so that it always claims to support
parsing of the given media types.
|
Modifier and Type | Method and Description |
---|---|
Map<MediaType,List<Parser>> |
CompositeParser.findDuplicateParsers(ParseContext context)
Utility method that goes through all the component parsers and finds
all media types for which more than one parser declares support.
|
List<Parser> |
CompositeParser.getAllComponentParsers()
Returns all parsers registered with the Composite Parser,
including ones which may not currently be active.
|
List<Parser> |
DefaultParser.getAllComponentParsers() |
Map<MediaType,Parser> |
CompositeParser.getParsers()
Returns the component parsers.
|
Map<MediaType,Parser> |
CompositeParser.getParsers(ParseContext context) |
Map<MediaType,Parser> |
DefaultParser.getParsers(ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
CompositeParser.setFallback(Parser fallback)
Sets the fallback parser.
|
static Parser |
ParserDecorator.withoutTypes(Parser parser,
Set<MediaType> excludeTypes)
Decorates the given parser so that it never claims to support
parsing of the given media types, but will work for all others.
|
static Parser |
ParserDecorator.withTypes(Parser parser,
Set<MediaType> types)
Decorates the given parser so that it always claims to support
parsing of the given media types.
|
Modifier and Type | Method and Description |
---|---|
void |
CompositeParser.setParsers(Map<MediaType,Parser> parsers)
Sets the component parsers.
|
static Parser |
ParserDecorator.withFallbacks(Collection<? extends Parser> parsers,
Set<MediaType> types)
Deprecated.
Do not use until the TODOs are resolved, see TIKA-1509
|
Constructor and Description |
---|
AutoDetectParser(Detector detector,
Parser... parsers) |
AutoDetectParser(Parser... parsers)
Creates an auto-detecting parser instance using the specified set of parser.
|
CompositeParser(MediaTypeRegistry registry,
Parser... parsers) |
DigestingParser(Parser parser,
DigestingParser.Digester digester)
Creates a decorator for the given parser.
|
ParserDecorator(Parser parser)
Creates a decorator for the given parser.
|
ParserPostProcessor(Parser parser)
Creates a post-processing decorator for the given parser.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
RecursiveParserWrapper(Parser wrappedParser)
Initialize the wrapper with
RecursiveParserWrapper.catchEmbeddedExceptions set
to true as default. |
RecursiveParserWrapper(Parser wrappedParser,
boolean catchEmbeddedExceptions) |
RecursiveParserWrapper(Parser wrappedParser,
ContentHandlerFactory contentHandlerFactory)
Deprecated.
|
RecursiveParserWrapper(Parser wrappedParser,
ContentHandlerFactory contentHandlerFactory,
boolean catchEmbeddedExceptions)
Deprecated.
|
Constructor and Description |
---|
CompositeParser(MediaTypeRegistry registry,
List<Parser> parsers) |
CompositeParser(MediaTypeRegistry registry,
List<Parser> parsers,
Collection<Class<? extends Parser>> excludeParsers) |
CompositeParser(MediaTypeRegistry registry,
List<Parser> parsers,
Collection<Class<? extends Parser>> excludeParsers) |
DefaultParser(MediaTypeRegistry registry,
ServiceLoader loader,
Collection<Class<? extends Parser>> excludeParsers) |
DefaultParser(MediaTypeRegistry registry,
ServiceLoader loader,
Collection<Class<? extends Parser>> excludeParsers,
EncodingDetector encodingDetector) |
Modifier and Type | Class and Description |
---|---|
class |
AppleSingleFileParser
Parser that strips the header off of AppleSingle and AppleDouble
files.
|
class |
PListParser
Parser for Apple's plist and bplist.
|
Modifier and Type | Class and Description |
---|---|
class |
ClassParser
Parser for Java .class files.
|
Modifier and Type | Class and Description |
---|---|
class |
AudioParser |
class |
MidiParser |
Modifier and Type | Class and Description |
---|---|
class |
ChmParser |
Modifier and Type | Class and Description |
---|---|
class |
SourceCodeParser
Generic Source code parser for Java, Groovy, C++.
|
Modifier and Type | Class and Description |
---|---|
class |
Pkcs7Parser
Basic parser for PKCS7 data.
|
class |
TSDParser
Tika parser for Time Stamped Data Envelope (application/timestamped-data)
|
Modifier and Type | Class and Description |
---|---|
class |
TextAndCSVParser
Unless the
TikaCoreProperties.CONTENT_TYPE_OVERRIDE is set,
this parser tries to assess whether the file is a text file, csv or tsv. |
Modifier and Type | Class and Description |
---|---|
class |
CTAKESParser
CTAKESParser decorates a
Parser and leverages on
CTAKESContentHandler to extract biomedical information from
clinical text using Apache cTAKES. |
Constructor and Description |
---|
CTAKESParser(Parser parser)
Wraps the specified Parser
|
Modifier and Type | Class and Description |
---|---|
class |
DBFParser
This is a Tika wrapper around the DBFReader.
|
Modifier and Type | Class and Description |
---|---|
class |
DIFParser |
Modifier and Type | Class and Description |
---|---|
class |
DWGParser
DWG (CAD Drawing) parser.
|
Modifier and Type | Class and Description |
---|---|
class |
EnviHeaderParser |
Modifier and Type | Class and Description |
---|---|
class |
EpubContentParser
Parser for EPUB OPS
*.html files. |
class |
EpubParser
Epub parser
|
Modifier and Type | Method and Description |
---|---|
Parser |
EpubParser.getContentParser() |
Parser |
EpubParser.getMetaParser() |
Modifier and Type | Method and Description |
---|---|
void |
EpubParser.setContentParser(Parser content) |
void |
EpubParser.setMetaParser(Parser meta) |
Modifier and Type | Class and Description |
---|---|
class |
ExecutableParser
Parser for executable files.
|
Modifier and Type | Class and Description |
---|---|
class |
CompositeExternalParser
A Composite Parser that wraps up all the available External Parsers,
and provides an easy way to access them.
|
class |
ExternalParser
Parser that uses an external program (like catdoc or pdf2txt) to extract
text content and metadata from a given document.
|
Modifier and Type | Class and Description |
---|---|
class |
FeedParser
Feed parser.
|
Modifier and Type | Class and Description |
---|---|
class |
AdobeFontMetricParser
Parser for AFM Font Files
|
class |
TrueTypeParser
Parser for TrueType font files (TTF).
|
Modifier and Type | Class and Description |
---|---|
class |
GDALParser
Wraps execution of the Geospatial Data Abstraction
Library (GDAL)
gdalinfo tool used to extract geospatial
information out of hundreds of geo file formats. |
Modifier and Type | Class and Description |
---|---|
class |
GeoParser |
Modifier and Type | Class and Description |
---|---|
class |
GeographicInformationParser |
Modifier and Type | Class and Description |
---|---|
class |
GribParser |
Modifier and Type | Class and Description |
---|---|
class |
HDFParser
Since the
NetCDFParser depends on the NetCDF-Java API,
we are able to use it to parse HDF files as well. |
Modifier and Type | Class and Description |
---|---|
class |
HtmlParser
HTML parser.
|
Modifier and Type | Class and Description |
---|---|
class |
HwpV5Parser |
Modifier and Type | Class and Description |
---|---|
class |
BPGParser
Parser for the Better Portable Graphics )BPG) File Format.
|
class |
HeifParser |
class |
ICNSParser
A basic parser class for Apple ICNS icon files
|
class |
ImageParser |
class |
PSDParser
Parser for the Adobe Photoshop PSD File Format.
|
class |
TiffParser |
class |
WebPParser |
Modifier and Type | Class and Description |
---|---|
class |
IDMLParser
Adobe InDesign IDML Parser.
|
Modifier and Type | Class and Description |
---|---|
class |
IptcAnpaParser
Parser for IPTC ANPA New Wire Feeds
|
Modifier and Type | Class and Description |
---|---|
class |
ISArchiveParser |
Modifier and Type | Class and Description |
---|---|
class |
IWorkPackageParser
A parser for the IWork container files.
|
Modifier and Type | Class and Description |
---|---|
class |
IWork13PackageParser |
class |
IWork18PackageParser
For now, this parser isn't even registered.
|
Modifier and Type | Class and Description |
---|---|
class |
SQLite3Parser
This is the main class for parsing SQLite3 files.
|
Modifier and Type | Class and Description |
---|---|
class |
JournalParser |
Modifier and Type | Class and Description |
---|---|
class |
JpegParser |
Modifier and Type | Class and Description |
---|---|
class |
RFC822Parser
Uses apache-mime4j to parse emails.
|
Modifier and Type | Class and Description |
---|---|
class |
MatParser |
Modifier and Type | Class and Description |
---|---|
class |
MboxParser
Mbox (mailbox) parser.
|
class |
OutlookPSTParser
Parser for MS Outlook PST email storage files
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractOfficeParser
Intermediate layer to set
OfficeParserConfig uniformly. |
class |
EMFParser
Extracts files embedded in EMF and offers a
very rough capability to extract text if there
is text stored in the EMF.
|
class |
JackcessParser
Parser that handles Microsoft Access files via
Jackcess
|
class |
MSOwnerFileParser
Parser for temporary MSOFfice files.
|
class |
OfficeParser
Defines a Microsoft document content extractor.
|
class |
OldExcelParser
A POI-powered Tika Parser for very old versions of Excel, from
pre-OLE2 days, such as Excel 4.
|
class |
TNEFParser
A POI-powered Tika Parser for TNEF (Transport Neutral
Encoding Format) messages, aka winmail.dat
|
class |
WMFParser
This parser offers a very rough capability to extract text if there
is text stored in the WMF files.
|
Modifier and Type | Class and Description |
---|---|
class |
OneNoteParser
OneNote tika parser capable of parsing Microsoft OneNote files.
|
Modifier and Type | Class and Description |
---|---|
class |
OOXMLParser
Office Open XML (OOXML) parser.
|
Modifier and Type | Class and Description |
---|---|
class |
Word2006MLParser |
Modifier and Type | Class and Description |
---|---|
class |
AbstractXML2003Parser |
class |
SpreadsheetMLParser
Parses wordml 2003 format Excel files.
|
class |
WordMLParser
Parses wordml 2003 format word files.
|
Modifier and Type | Class and Description |
---|---|
class |
MIFParser |
Modifier and Type | Class and Description |
---|---|
class |
Mp3Parser
The
Mp3Parser is used to parse ID3 Version 1 Tag information
from an MP3 file, if available. |
Modifier and Type | Class and Description |
---|---|
class |
MP4Parser
Parser for the MP4 media container format, as well as the older
QuickTime format that MP4 is based on.
|
class |
NoakesMP4Parser
Parser for the MP4 media container format, as well as the older
QuickTime format that MP4 is based on.
|
Modifier and Type | Class and Description |
---|---|
class |
NamedEntityParser
This implementation of
Parser extracts
entity names from text content and adds it to the metadata. |
Modifier and Type | Class and Description |
---|---|
class |
NetCDFParser
|
Modifier and Type | Class and Description |
---|---|
class |
TesseractOCRParser
TesseractOCRParser powered by tesseract-ocr engine.
|
Modifier and Type | Class and Description |
---|---|
class |
FlatOpenDocumentParser |
class |
OpenDocumentContentParser
Parser for ODF
content.xml files. |
class |
OpenDocumentMetaParser
Parser for OpenDocument
meta.xml files. |
class |
OpenDocumentParser
OpenOffice parser
|
Modifier and Type | Method and Description |
---|---|
Parser |
OpenDocumentParser.getContentParser() |
Parser |
OpenDocumentParser.getMetaParser() |
Modifier and Type | Method and Description |
---|---|
void |
OpenDocumentParser.setContentParser(Parser content) |
void |
OpenDocumentParser.setMetaParser(Parser meta) |
Modifier and Type | Class and Description |
---|---|
class |
OpenOfficeParser
Deprecated.
Use the
OpenDocumentParser class instead.
This class will be removed in Apache Tika 1.0. |
Modifier and Type | Class and Description |
---|---|
class |
PDFParser
PDF parser.
|
class |
PDFPreflightParser
Deprecated.
This will be removed in 2.x. The PDFBox community voted
to retire the preflight parser in PDFBox 4.x.
|
Modifier and Type | Class and Description |
---|---|
class |
CompressorParser
Parser for various compression formats.
|
class |
PackageParser
Parser for various packaging formats.
|
class |
RarParser
Parser for Rar files.
|
Modifier and Type | Class and Description |
---|---|
class |
PooledTimeSeriesParser
Uses the Pooled Time Series algorithm + command line tool, to
generate a numeric representation of the video suitable for
similarity searches.
|
Modifier and Type | Class and Description |
---|---|
class |
PRTParser
A basic text extracting parser for the CADKey PRT (CAD Drawing)
format.
|
Modifier and Type | Class and Description |
---|---|
class |
AgeRecogniser
Parser for extracting features from text.
|
class |
ObjectRecognitionParser
This parser recognises objects from Images.
|
Modifier and Type | Class and Description |
---|---|
class |
TensorflowImageRecParser
This is an implementation of
ObjectRecogniser powered by Tensorflow
convolutional neural network (CNN). |
Modifier and Type | Class and Description |
---|---|
class |
RTFParser
RTF parser
|
Modifier and Type | Class and Description |
---|---|
class |
SAS7BDATParser
Processes the SAS7BDAT data columnar database file used by SAS and
other similar languages.
|
Modifier and Type | Class and Description |
---|---|
class |
SentimentAnalysisParser
This parser classifies documents based on the sentiment of document.
|
Modifier and Type | Class and Description |
---|---|
class |
Latin1StringsParser
Parser to extract printable Latin1 strings from arbitrary files with pure java
without running any external process.
|
class |
StringsParser
Parser that uses the "strings" (or strings-alternative) command to find the
printable strings in a object, or other binary, file
(application/octet-stream).
|
Modifier and Type | Class and Description |
---|---|
class |
TXTParser
Plain text parser.
|
Modifier and Type | Class and Description |
---|---|
class |
FLVParser
Parser for metadata contained in Flash Videos (.flv).
|
Modifier and Type | Class and Description |
---|---|
class |
QuattroProParser
Parser for Corel QuattroPro documents (part of Corel WordPerfect
Office Suite).
|
class |
WordPerfectParser
Parser for Corel WordPerfect documents.
|
Modifier and Type | Class and Description |
---|---|
class |
XLIFF12Parser
Parser for XLIFF 1.2 files.
|
class |
XLZParser
Parser for XLZ Archives.
|
Modifier and Type | Class and Description |
---|---|
class |
DcXMLParser
Dublin Core metadata parser
|
class |
FictionBookParser |
class |
XMLParser
XML parser.
|
class |
XMLProfiler
This parser enables profiling of XML.
|
Modifier and Type | Method and Description |
---|---|
static Parser |
TikaResource.createParser() |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillMetadata(Parser parser,
Metadata metadata,
ParseContext context,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders) |
static void |
TikaResource.fillParseContext(ParseContext parseContext,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Parser embeddedParser)
Fills the parse context.
|
static Detector |
TikaResource.getDetector(Parser p) |
static void |
TikaResource.parse(Parser parser,
org.slf4j.Logger logger,
String path,
InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext)
Use this to call a parser and unify exception handling.
|
static void |
TikaResource.setDetector(Parser p,
Detector detector) |
Modifier and Type | Method and Description |
---|---|
static String |
ParserUtils.getParserClassname(Parser parser)
Identifies the real class name of the
Parser , unwrapping
any ParserDecorator decorations on top of it. |
static void |
ParserUtils.recordParserDetails(Parser parser,
Metadata metadata)
|
static void |
ParserUtils.recordParserFailure(Parser parser,
Throwable failure,
Metadata metadata)
|
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.