Modifier and Type | Method and Description |
---|---|
protected void |
FileResourceConsumer.parse(String resourceId,
Parser parser,
InputStream is,
ContentHandler handler,
Metadata m,
ParseContext parseContext)
Utility method to handle logging equivalently among all
implementing classes.
|
Modifier and Type | Method and Description |
---|---|
static long |
TikaTaskTimeout.getTimeoutMillis(ParseContext context,
long defaultTimeoutMillis) |
Modifier and Type | Method and Description |
---|---|
List<RecognisedObject> |
DL4JInceptionV3Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
DL4JVGG16Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Set<MediaType> |
Embedder.getSupportedEmbedTypes(ParseContext context)
Returns the set of media types supported by this embedder when used with
the given parse context.
|
Set<MediaType> |
ExternalEmbedder.getSupportedEmbedTypes(ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
PrescriptionParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
Set<MediaType> |
PrescriptionParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
DirListParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
EncryptedPrescriptionParser.getSupportedTypes(ParseContext context) |
void |
PickBestTextEncodingParser.parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
|
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
LanguageDetectingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EncryptedPrescriptionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PickBestTextEncodingParser.parse(InputStream stream,
ContentHandler handler,
Metadata originalMetadata,
ParseContext context)
Deprecated.
|
protected boolean |
PickBestTextEncodingParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Deprecated.
|
protected void |
PickBestTextEncodingParser.parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
static EmbeddedDocumentExtractor |
EmbeddedDocumentUtil.getEmbeddedDocumentExtractor(ParseContext context)
This offers a uniform way to get an EmbeddedDocumentExtractor from a ParseContext.
|
static Parser |
EmbeddedDocumentUtil.getStatelessParser(ParseContext context)
Utility function to get the Parser that was sent in to the
ParseContext to handle embedded documents.
|
EmbeddedDocumentExtractor |
EmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
EmbeddedDocumentExtractor |
ParsingEmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
static Parser |
EmbeddedDocumentUtil.tryToFindExistingLeafParser(Class clazz,
ParseContext context)
Tries to find an existing parser within the ParseContext.
|
Constructor and Description |
---|
EmbeddedDocumentUtil(ParseContext context) |
ParsingEmbeddedDocumentExtractor(ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ForkParser.getSupportedTypes(ParseContext context) |
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
Modifier and Type | Method and Description |
---|---|
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
Map<MediaType,List<Parser>> |
CompositeParser.findDuplicateParsers(ParseContext context)
Utility method that goes through all the component parsers and finds
all media types for which more than one parser declares support.
|
protected Parser |
DelegatingParser.getDelegateParser(ParseContext context)
Returns the parser instance to which parsing tasks should be delegated.
|
protected EncodingDetector |
AbstractEncodingDetectorParser.getEncodingDetector(ParseContext parseContext)
Look for an EncodingDetetor in the ParseContext.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
Map<MediaType,Parser> |
CompositeParser.getParsers(ParseContext context) |
Map<MediaType,Parser> |
DefaultParser.getParsers(ParseContext context) |
Set<MediaType> |
CompositeParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
ErrorParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
RecursiveParserWrapper.getSupportedTypes(ParseContext context) |
Set<MediaType> |
Parser.getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used
with the given parse context.
|
Set<MediaType> |
DelegatingParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
RegexCaptureParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
ParserDecorator.getSupportedTypes(ParseContext context)
Delegates the method call to the decorated parser.
|
Set<MediaType> |
EmptyParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
CryptoParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
NetworkParser.getSupportedTypes(ParseContext context) |
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context) |
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
RegexCaptureParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
AppleSingleFileParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
PListParser.getSupportedTypes(ParseContext context) |
void |
AppleSingleFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PListParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ClassParser.getSupportedTypes(ParseContext context) |
void |
ClassParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MidiParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
AudioParser.getSupportedTypes(ParseContext context) |
void |
MidiParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AudioParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
List<CaptionObject> |
TensorflowRESTCaptioner.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
SourceCodeParser.getSupportedTypes(ParseContext context) |
void |
SourceCodeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
Pkcs7Parser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
TSDParser.getSupportedTypes(ParseContext context) |
void |
Pkcs7Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
TextAndCSVParser.getSupportedTypes(ParseContext context) |
void |
TextAndCSVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
CTAKESParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
DBFParser.getSupportedTypes(ParseContext context) |
void |
DBFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
DGN8Parser.getSupportedTypes(ParseContext context) |
void |
DGN8Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
DIFParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
Set<MediaType> |
DIFParser.getSupportedTypes(ParseContext context) |
void |
DIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
AbstractDWGParser.configure(ParseContext parseContext) |
Set<MediaType> |
DWGReadParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
DWGParser.getSupportedTypes(ParseContext context) |
void |
DWGReadParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DWGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
EnviHeaderParser.getSupportedTypes(ParseContext context) |
void |
EnviHeaderParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
EpubContentParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
EpubParser.getSupportedTypes(ParseContext context) |
void |
EpubContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ExecutableParser.getSupportedTypes(ParseContext context) |
void |
ExecutableParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ExternalParser.getSupportedTypes(ParseContext context) |
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ExternalParser.getSupportedTypes(ParseContext context) |
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
FeedParser.getSupportedTypes(ParseContext context) |
void |
FeedParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
AdobeFontMetricParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
TrueTypeParser.getSupportedTypes(ParseContext context) |
void |
AdobeFontMetricParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TrueTypeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
GDALParser.getSupportedTypes(ParseContext context) |
void |
GDALParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
GeoParser.getSupportedTypes(ParseContext parseContext) |
void |
GeoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
GeographicInformationParser.getSupportedTypes(ParseContext parseContext) |
void |
GeographicInformationParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
GribParser.getSupportedTypes(ParseContext context) |
void |
GribParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
HDFParser.getSupportedTypes(ParseContext context) |
void |
HDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
HtmlParser.getSupportedTypes(ParseContext context) |
void |
HtmlParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
HttpParser.getSupportedTypes(ParseContext context) |
void |
HttpParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
HwpV5Parser.getSupportedTypes(ParseContext context) |
void |
HwpV5Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
IDMLParser.getSupportedTypes(ParseContext context) |
void |
IDMLParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
IptcAnpaParser.getSupportedTypes(ParseContext context) |
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ISArchiveParser.getSupportedTypes(ParseContext context) |
void |
ISArchiveParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseAssay(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context,
String studyFileName) |
static void |
ISATabUtils.parseStudy(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
IWorkPackageParser.getSupportedTypes(ParseContext context) |
void |
IWorkPackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
IWork18PackageParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
IWork13PackageParser.getSupportedTypes(ParseContext context) |
void |
IWork18PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
IWork13PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected Connection |
AbstractDBParser.getConnection(InputStream stream,
Metadata metadata,
ParseContext context)
Override this for special configuration of the connection, such as limiting
the number of rows to be held in memory.
|
protected abstract String |
AbstractDBParser.getConnectionString(InputStream stream,
Metadata metadata,
ParseContext parseContext)
Implement for db specific connection information, e.g.
|
Set<MediaType> |
AbstractDBParser.getSupportedTypes(ParseContext context) |
protected abstract List<String> |
AbstractDBParser.getTableNames(Connection connection,
Metadata metadata,
ParseContext context)
Returns the names of the tables to process
|
protected abstract JDBCTableReader |
AbstractDBParser.getTableReader(Connection connection,
String tableName,
ParseContext parseContext)
|
protected void |
JDBCTableReader.handleBlob(String tableName,
String columnName,
int rowNum,
ResultSet resultSet,
int columnIndex,
ContentHandler handler,
ParseContext context) |
protected void |
JDBCTableReader.handleClob(String tableName,
String columnName,
int rowNum,
ResultSet resultSet,
int columnIndex,
ContentHandler handler,
ParseContext context) |
boolean |
JDBCTableReader.nextRow(ContentHandler handler,
ParseContext context) |
void |
AbstractDBParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
JournalParser.getSupportedTypes(ParseContext context) |
void |
JournalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
GrobidRESTParser.parse(String filePath,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Metadata |
TEIDOMParser.parse(String source,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
RFC822Parser.getSupportedTypes(ParseContext context) |
void |
RFC822Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MatParser.getSupportedTypes(ParseContext context) |
void |
MatParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MboxParser.getSupportedTypes(ParseContext context) |
void |
MboxParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AbstractOfficeParser.configure(ParseContext parseContext)
Checks to see if the user has specified an
OfficeParserConfig . |
Set<MediaType> |
WMFParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
JackcessParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
EMFParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
MSOwnerFileParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
OldExcelParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
OfficeParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
TNEFParser.getSupportedTypes(ParseContext context) |
protected void |
OfficeParser.parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml) |
void |
WMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
JackcessParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MSOwnerFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts owner from MS temp file
|
void |
OldExcelParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OfficeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
TNEFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
Constructor and Description |
---|
ExcelExtractor(ParseContext context,
Metadata metadata) |
HSLFExtractor(ParseContext context,
Metadata metadata) |
OutlookExtractor(org.apache.poi.poifs.filesystem.DirectoryNode root,
Metadata metadata,
ParseContext context) |
OutlookExtractor(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context)
Deprecated.
use
OutlookExtractor(DirectoryNode, Metadata, ParseContext)
Will be removed after 2.4.0 |
OutlookExtractor(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem,
ParseContext context)
Deprecated.
use
OutlookExtractor(DirectoryNode, Metadata, ParseContext)
Will be removed after 2.4.0 |
WordExtractor(ParseContext context,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ChmParser.getSupportedTypes(ParseContext context) |
void |
ChmParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
OneNoteParser.getSupportedTypes(ParseContext context) |
void |
OneNoteParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Field and Description |
---|---|
protected ParseContext |
XSSFExcelExtractorDecorator.parseContext |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
OOXMLParser.getSupportedTypes(ParseContext context) |
void |
XSSFExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AbstractOOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
void |
XSSFBExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
OOXMLExtractorFactory.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
OOXMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) |
POIXMLTextExtractorDecorator(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) |
SXSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
XSLFEventBasedPowerPointExtractor extractor) |
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
XSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xslf.extractor.XSLFExtractor extractor) |
XSSFBExcelExtractorDecorator(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor,
Locale locale) |
XSSFExcelExtractorDecorator(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor,
Locale locale) |
XWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor) |
XWPFWordExtractorDecorator(ParseContext context,
org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor)
|
Constructor and Description |
---|
XPSExtractorDecorator(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) |
Constructor and Description |
---|
XWPFStylesShim(org.apache.poi.openxml4j.opc.PackagePart part,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
Word2006MLParser.getSupportedTypes(ParseContext context) |
void |
Word2006MLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
OutlookPSTParser.getSupportedTypes(ParseContext context) |
void |
OutlookPSTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
RTFParser.getSupportedTypes(ParseContext context) |
void |
RTFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
SpreadsheetMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
AbstractXML2003Parser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
protected ContentHandler |
WordMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
Set<MediaType> |
SpreadsheetMLParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
WordMLParser.getSupportedTypes(ParseContext context) |
void |
AbstractXML2003Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MIFParser.getSupportedTypes(ParseContext context) |
void |
MIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
Mp3Parser.getSupportedTypes(ParseContext context) |
void |
Mp3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MP4Parser.getSupportedTypes(ParseContext context) |
void |
MP4Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
AbstractMultipleParser.getSupportedTypes(ParseContext context) |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
The
ContentHandlerFactory override is still experimental
and the method signature is subject to change before Tika 2.0 |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
protected boolean |
SupplementingParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected boolean |
FallbackParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected abstract boolean |
AbstractMultipleParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Used to notify implementations that a Parser has Finished
or Failed, and to allow them to decide to continue or
abort further parsing
|
protected void |
AbstractMultipleParser.parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Used to allow implementations to prepare or change things
before parsing occurs
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
NamedEntityParser.getSupportedTypes(ParseContext parseContext) |
void |
NamedEntityParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
NetCDFParser.getSupportedTypes(ParseContext context) |
void |
NetCDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
TesseractOCRParser.getSupportedTypes(ParseContext context) |
void |
TesseractOCRParser.parse(Image image,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TesseractOCRParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
OpenDocumentMetaParser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
Set<MediaType> |
FlatOpenDocumentParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
OpenDocumentContentParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
OpenDocumentParser.getSupportedTypes(ParseContext context) |
void |
FlatOpenDocumentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentMetaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
PDMetadataExtractor.extract(org.apache.pdfbox.pdmodel.common.PDMetadata pdMetadata,
Metadata metadata,
ParseContext context) |
static void |
PDMetadataExtractor.extract(org.apache.jempbox.xmp.XMPMetadata xmp,
Metadata metadata,
ParseContext context) |
protected org.apache.pdfbox.pdmodel.PDDocument |
PDFParser.getPDDocument(InputStream inputStream,
String password,
org.apache.pdfbox.io.MemoryUsageSetting memoryUsageSetting,
Metadata metadata,
ParseContext parseContext) |
protected org.apache.pdfbox.pdmodel.PDDocument |
PDFParser.getPDDocument(Path path,
String password,
org.apache.pdfbox.io.MemoryUsageSetting memoryUsageSetting,
Metadata metadata,
ParseContext parseContext) |
Set<MediaType> |
PDFParser.getSupportedTypes(ParseContext context) |
void |
PDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
PDFMarkedContent2XHTML.process(org.apache.pdfbox.pdmodel.PDDocument pdDocument,
ContentHandler handler,
ParseContext context,
Metadata metadata,
PDFParserConfig config)
Converts the given PDF document (and related metadata) to a stream
of XHTML SAX events sent to the given content handler.
|
Modifier and Type | Field and Description |
---|---|
protected ParseContext |
ImageGraphicsEngine.parseContext |
Modifier and Type | Method and Description |
---|---|
ImageGraphicsEngine |
ImageGraphicsEngineFactory.newEngine(org.apache.pdfbox.pdmodel.PDPage page,
int pageNumber,
EmbeddedDocumentExtractor embeddedDocumentExtractor,
PDFParserConfig pdfParserConfig,
Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages,
AtomicInteger imageCounter,
XHTMLContentHandler xhtml,
Metadata parentMetadata,
ParseContext parseContext) |
Constructor and Description |
---|
ImageGraphicsEngine(org.apache.pdfbox.pdmodel.PDPage page,
int pageNumber,
EmbeddedDocumentExtractor embeddedDocumentExtractor,
PDFParserConfig pdfParserConfig,
Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages,
AtomicInteger imageCounter,
XHTMLContentHandler xhtml,
Metadata parentMetadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
CompressorParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
PackageParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
RarParser.getSupportedTypes(ParseContext arg0) |
Set<MediaType> |
UnrarParser.getSupportedTypes(ParseContext arg0) |
void |
CompressorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RarParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
UnrarParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
PooledTimeSeriesParser.getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the
given parse context.
|
void |
PooledTimeSeriesParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
PRTParser.getSupportedTypes(ParseContext context) |
void |
PRTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
ObjectRecognitionParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
AgeRecogniser.getSupportedTypes(ParseContext parseContext) |
void |
ObjectRecognitionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AgeRecogniser.parse(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<? extends RecognisedObject> |
ObjectRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Recognise the objects in the stream
|
Modifier and Type | Method and Description |
---|---|
List<RecognisedObject> |
TensorflowImageRecParser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
TensorflowRESTRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
SAS7BDATParser.getSupportedTypes(ParseContext context) |
void |
SAS7BDATParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
SentimentAnalysisParser.getSupportedTypes(ParseContext context)
Returns the types supported
|
void |
SentimentAnalysisParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Performs the parse
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
SQLite3Parser.getSupportedTypes(ParseContext context) |
void |
SQLite3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
Latin1StringsParser.getSupportedTypes(ParseContext arg0) |
Set<MediaType> |
StringsParser.getSupportedTypes(ParseContext context) |
void |
Latin1StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
TMXParser.getSupportedTypes(ParseContext context) |
void |
TMXParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
AmazonTranscribe.getSupportedTypes(ParseContext context) |
void |
AmazonTranscribe.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Starts AWS Transcribe Job with language specification.
|
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
TXTParser.getSupportedTypes(ParseContext context) |
void |
TXTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
FLVParser.getSupportedTypes(ParseContext context) |
void |
FLVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
WACZParser.getSupportedTypes(ParseContext context) |
void |
WACZParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
WARCParser.getSupportedTypes(ParseContext context) |
void |
WARCParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
QuattroProParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
WordPerfectParser.getSupportedTypes(ParseContext context) |
void |
QuattroProParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WordPerfectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
XLZParser.getSupportedTypes(ParseContext context) |
Set<MediaType> |
XLIFF12Parser.getSupportedTypes(ParseContext context) |
void |
XLZParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
XLIFF12Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
Renderer.getSupportedTypes(ParseContext context)
Returns the set of media types supported by this renderer when used
with the given parse context.
|
Set<MediaType> |
CompositeRenderer.getSupportedTypes(ParseContext context) |
RenderResults |
Renderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
RenderResults |
CompositeRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
Modifier and Type | Method and Description |
---|---|
Set<MediaType> |
MuPDFRenderer.getSupportedTypes(ParseContext context) |
RenderResults |
MuPDFRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
Modifier and Type | Method and Description |
---|---|
protected int |
PDFBoxRenderer.getDPI(ParseContext parseContext) |
protected String |
PDFBoxRenderer.getImageFormatName(ParseContext parseContext) |
protected org.apache.pdfbox.rendering.ImageType |
PDFBoxRenderer.getImageType(ParseContext parseContext) |
Set<MediaType> |
PDFBoxRenderer.getSupportedTypes(ParseContext context) |
RenderResults |
PDFBoxRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
protected RenderResult |
PDFBoxRenderer.renderPage(org.apache.pdfbox.rendering.PDFRenderer renderer,
int id,
int pageNumber,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
ContentHandler |
ContentHandlerDecoratorFactory.decorate(ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Constructor and Description |
---|
BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE type,
int writeLimit,
boolean throwOnWriteLimitReached,
ParseContext parseContext) |
WriteOutContentHandler(ContentHandler handler,
int writeLimit,
boolean throwOnWriteLimitReached,
ParseContext parseContext)
The default is to throw a
WriteLimitReachedException |
Modifier and Type | Method and Description |
---|---|
void |
CompositeParseContextConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
void |
ParseContextConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> headers,
Metadata metadata,
ParseContext context)
Configures the parseContext with present headers.
|
Modifier and Type | Method and Description |
---|---|
void |
DocumentSelectorConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata mtadata,
ParseContext context) |
void |
TimeoutConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
void |
PasswordProviderConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillParseContext(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext) |
protected static long |
TikaResource.getTaskTimeout(ParseContext parseContext) |
static void |
TikaResource.parse(Parser parser,
org.slf4j.Logger logger,
String path,
InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext)
Use this to call a parser and unify exception handling.
|
Modifier and Type | Method and Description |
---|---|
void |
PDFServerConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext)
Configures the parseContext with present headers.
|
void |
TesseractServerConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext)
Configures the parseContext with present headers.
|
Modifier and Type | Method and Description |
---|---|
static Document |
XMLReaderUtils.buildDOM(InputStream is,
ParseContext context)
This checks context for a user specified
DocumentBuilder . |
static Document |
XMLReaderUtils.buildDOM(Reader reader,
ParseContext context)
This checks context for a user specified
DocumentBuilder . |
static Future |
ConcurrentUtils.execute(ParseContext context,
Runnable runnable)
Execute a runnable using an ExecutorService from the ParseContext if possible.
|
static void |
XMLReaderUtils.parseSAX(InputStream is,
ContentHandler contentHandler,
ParseContext context)
This checks context for a user specified
SAXParser . |
static void |
XMLReaderUtils.parseSAX(Reader reader,
ContentHandler contentHandler,
ParseContext context)
This checks context for a user specified
SAXParser . |
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.