Modifier and Type | Method and Description |
---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
Tika.parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
FileResource.getMetadata()
This gets the metadata available before the parsing of the file.
|
Modifier and Type | Method and Description |
---|---|
OutputStream |
OutputStreamFactory.getOutputStream(Metadata metadata) |
protected void |
FileResourceConsumer.parse(String resourceId,
Parser parser,
InputStream is,
ContentHandler handler,
Metadata m,
ParseContext parseContext)
Utility method to handle logging equivalently among all
implementing classes.
|
protected boolean |
FileResourceCrawler.select(Metadata m) |
Modifier and Type | Method and Description |
---|---|
Metadata |
FSFileResource.getMetadata() |
Modifier and Type | Method and Description |
---|---|
OutputStream |
FSOutputStreamFactory.getOutputStream(Metadata metadata)
This tries to create a file based on the
FSUtil.HANDLE_EXISTING
value that was passed in during initialization. |
boolean |
FSDocumentSelector.select(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
MediaType |
OverrideDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
MediaType |
TrainedModelDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
ZeroSizeFileDetector.detect(InputStream stream,
Metadata metadata) |
Charset |
CompositeEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
Charset |
NonDetectingEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
Constructor and Description |
---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
Modifier and Type | Method and Description |
---|---|
List<RecognisedObject> |
DL4JInceptionV3Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
DL4JVGG16Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
Modifier and Type | Method and Description |
---|---|
protected static String |
AbstractProfiler.getContent(Metadata metadata) |
protected static String |
AbstractProfiler.getContent(Metadata metadata,
int maxLength,
Map<Cols,String> data)
Get the content and record in the data
Cols.CONTENT_TRUNCATED_AT_MAX_LEN whether the string was truncated |
protected org.apache.tika.eval.EvalFilePaths |
AbstractProfiler.getPathsFromExtractCrawl(Metadata metadata,
Path extracts) |
protected org.apache.tika.eval.EvalFilePaths |
AbstractProfiler.getPathsFromSrcCrawl(Metadata metadata,
Path srcDir,
Path extracts) |
protected void |
AbstractProfiler.writeContentData(String fileId,
Metadata m,
String fieldName,
TableInfo contentsTable)
Checks to see if metadata is null or content is empty (null or only whitespace).
|
protected void |
AbstractProfiler.writeExceptionData(String fileId,
Metadata m,
TableInfo exceptionTable) |
protected void |
AbstractProfiler.writeProfileData(org.apache.tika.eval.EvalFilePaths fps,
int i,
Metadata m,
String fileId,
String containerId,
List<Integer> numAttachments,
TableInfo profileTable) |
Modifier and Type | Method and Description |
---|---|
protected long |
AbstractProfiler.getSourceFileLength(org.apache.tika.eval.EvalFilePaths fps,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ExtractReader.loadExtract(Path extractFile) |
Modifier and Type | Method and Description |
---|---|
static Metadata |
DisplayMetInstance.getMet(URL url) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ParsingExample.recursiveParserWrapperExample()
For documents that may contain embedded documents, it might be helpful
to create list of metadata objects, one for the container document and
one for each embedded document.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
EncryptedPrescriptionDetector.detect(InputStream stream,
Metadata metadata) |
protected ContentHandler |
PrescriptionParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata) |
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
LanguageDetectingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EncryptedPrescriptionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static String |
MyFirstTika.parseUsingAutoDetect(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
static String |
MyFirstTika.parseUsingComponents(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
Constructor and Description |
---|
LazyTextExtractorField(Parser parser,
org.apache.jackrabbit.core.value.InternalValue value,
Metadata metadata,
Executor executor,
boolean highlighting,
int maxFieldLength)
Creates a new
LazyTextExtractorField with the given
name . |
Modifier and Type | Method and Description |
---|---|
String |
EmbeddedDocumentUtil.getExtension(TikaInputStream is,
Metadata metadata) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentUtil.parseEmbedded(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
static void |
EmbeddedDocumentUtil.recordEmbeddedStreamException(Throwable t,
Metadata m) |
static void |
EmbeddedDocumentUtil.recordException(Throwable t,
Metadata m) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentUtil.shouldParseEmbedded(Metadata m) |
Modifier and Type | Method and Description |
---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
Modifier and Type | Method and Description |
---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
Modifier and Type | Method and Description |
---|---|
Metadata |
JsonMetadataDeserializer.deserialize(com.google.gson.JsonElement element,
Type type,
com.google.gson.JsonDeserializationContext context)
Deserializes a json object (equivalent to: Map
|
static Metadata |
JsonMetadata.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
static List<Metadata> |
JsonMetadataList.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
protected String[] |
JsonMetadataSerializer.getNames(Metadata metadata)
Override to get a custom sort order
or to filter names.
|
com.google.gson.JsonElement |
JsonMetadataSerializer.serialize(Metadata metadata,
Type type,
com.google.gson.JsonSerializationContext context)
Serializes a Metadata object into effectively Map
|
static void |
JsonMetadata.toJson(Metadata metadata,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
static void |
JsonMetadataList.toJson(List<Metadata> metadataList,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
ProbabilisticMimeDetectionSelector.detect(InputStream input,
Metadata metadata) |
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
RecursiveParserWrapper.getMetadata()
The first element in the returned list represents the
data from the outer container file.
|
Modifier and Type | Method and Description |
---|---|
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler ignore,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
Constructor and Description |
---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
void |
AppleSingleFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ClassParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MidiParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AudioParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTCaptioner.getApiUri(Metadata metadata) |
List<CaptionObject> |
TensorflowRESTCaptioner.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ChmParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SourceCodeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Pkcs7Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
CTAKESContentHandler.getMetadata()
Returns metadata that includes cTAKES annotations.
|
Modifier and Type | Method and Description |
---|---|
void |
CTAKESParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
CTAKESContentHandler(ContentHandler handler,
Metadata metadata)
Creates a new
CTAKESContentHandler for the given ContentHandler and Metadata objects. |
CTAKESContentHandler(ContentHandler handler,
Metadata metadata,
CTAKESConfig config)
Creates a new
CTAKESContentHandler for the given ContentHandler and Metadata objects. |
Modifier and Type | Method and Description |
---|---|
void |
DBFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
DIFParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
DWGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EnviHeaderParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EpubContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExecutableParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ExecutableParser.parseELF(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a Unix ELF file
|
void |
ExecutableParser.parsePE(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a DOS or Windows PE file
|
Modifier and Type | Method and Description |
---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
void |
FeedParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TrueTypeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AdobeFontMetricParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GDALParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeographicInformationParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
GribParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
HDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected void |
HDFParser.unravelStringMet(ucar.nc2.NetcdfFile ncFile,
ucar.nc2.Group group,
Metadata met) |
Modifier and Type | Method and Description |
---|---|
Charset |
HtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
HtmlParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ImageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TiffParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ICNSParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WebPParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
BPGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ImageMetadataExtractor(Metadata metadata) |
ImageMetadataExtractor(Metadata metadata,
org.apache.tika.parser.image.ImageMetadataExtractor.DirectoryHandler... handlers) |
Modifier and Type | Method and Description |
---|---|
static void |
JempboxExtractor.extractDublinCore(org.apache.jempbox.xmp.XMPMetadata xmpMetadata,
Metadata metadata)
Tries to extract Dublin Core schema from XMP.
|
static void |
JempboxExtractor.extractXMPMM(org.apache.jempbox.xmp.XMPMetadata xmp,
Metadata metadata)
Extracts Media Management metadata from XMP.
|
Constructor and Description |
---|
JempboxExtractor(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
This method will be removed in Apache Tika 1.0.
|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ISArchiveParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseAssay(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context,
String studyFileName) |
static void |
ISATabUtils.parseStudy(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWorkPackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWork13PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SQLite3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
TEIDOMParser.parse(String source,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
JournalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
GrobidRESTParser.parse(String filePath,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
JpegParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
MailUtil.addPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
void |
RFC822Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
MailUtil.setPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
Modifier and Type | Method and Description |
---|---|
void |
MatParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Map<Integer,Metadata> |
MboxParser.getTrackingMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
MboxParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OutlookPSTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
OutlookExtractor.addEvenIfNull(Property property,
String value,
Metadata metadata) |
static void |
SummaryExtractor.addMulti(Metadata metadata,
Property property,
String string) |
MediaType |
POIFSContainerDetector.detect(InputStream input,
Metadata metadata) |
protected void |
OfficeParser.parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml) |
void |
EMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OldExcelParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OfficeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
JackcessParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TNEFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
WMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MSOwnerFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts owner from MS temp file
|
void |
OutlookExtractor.parse(XHTMLContentHandler xhtml,
Metadata metadata) |
Constructor and Description |
---|
ExcelExtractor(ParseContext context,
Metadata metadata) |
HSLFExtractor(ParseContext context,
Metadata metadata) |
SummaryExtractor(Metadata metadata) |
WordExtractor(ParseContext context,
Metadata metadata) |
Modifier and Type | Field and Description |
---|---|
protected Metadata |
XSSFExcelExtractorDecorator.metadata |
Modifier and Type | Method and Description |
---|---|
void |
MetadataExtractor.extract(Metadata metadata) |
void |
OOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
void |
XSSFExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XSSFBExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AbstractOOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected Map<String,String> |
AbstractOOXMLExtractor.loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart,
boolean includeInternal,
Metadata metadata)
This is used by the SAX docx and pptx decorators to load hyperlinks and
other linked objects
|
void |
OOXMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
OOXMLExtractorFactory.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
SXSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
XSLFEventBasedPowerPointExtractor extractor) |
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
XSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xslf.extractor.XSLFPowerPointExtractor extractor) |
XWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
void |
Word2006MLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
SpreadsheetMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
AbstractXML2003Parser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
protected ContentHandler |
WordMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
void |
AbstractXML2003Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
SpreadsheetMLParser.setContentType(Metadata metadata) |
protected abstract void |
AbstractXML2003Parser.setContentType(Metadata contentType) |
void |
WordMLParser.setContentType(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
Mp3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MP4Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
NamedEntityParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
NetCDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TesseractOCRParser.parse(Image image,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TesseractOCRParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
OpenDocumentMetaParser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
OpenDocumentParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentMetaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AccessChecker.check(Metadata metadata)
Checks to see if a document's content should be extracted based
on metadata values and the value of
AccessChecker.allowAccessibility in the constructor. |
void |
PDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected static Metadata |
PackageParser.handleEntryMetadata(String name,
Date createAt,
Date modifiedAt,
Long size,
XHTMLContentHandler xhtml) |
Modifier and Type | Method and Description |
---|---|
boolean |
CompressorParserOptions.decompressConcatenated(Metadata metadata) |
MediaType |
ZipContainerDetector.detect(InputStream input,
Metadata metadata) |
void |
RarParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompressorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
PooledTimeSeriesParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
Modifier and Type | Method and Description |
---|---|
void |
PRTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AgeRecogniser.parse(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ObjectRecognitionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<? extends RecognisedObject> |
ObjectRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Recognise the objects in the stream
|
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTRecogniser.getApiUri(Metadata metadata) |
protected URI |
TensorflowRESTVideoRecogniser.getApiUri(Metadata metadata) |
List<RecognisedObject> |
TensorflowRESTRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
TensorflowImageRecParser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SentimentAnalysisParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Performs the parse
|
Modifier and Type | Method and Description |
---|---|
void |
StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Latin1StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Charset |
UniversalEncodingDetector.detect(InputStream input,
Metadata metadata) |
Charset |
Icu4jEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
TXTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
FLVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
QuattroProParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WordPerfectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
FictionBookParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
DcXMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
XMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AttributeDependantMetadataHandler(Metadata metadata,
String nameHoldingAttribute,
String namePrefix) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
Property property) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
String name) |
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty)
Constructor for Property metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for Property metadata keys which allows change of behavior
for duplicate and empty entry values.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name)
Constructor for string metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for string metadata keys which allows change of behavior
for duplicate and empty entry values.
|
MetadataHandler(Metadata metadata,
Property property)
Deprecated.
|
MetadataHandler(Metadata metadata,
String name)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPContentHandler.metadata(Metadata metadata) |
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
StandardsExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
MetadataList.getMetadata() |
Constructor and Description |
---|
MetadataList(List<Metadata> metadata) |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillMetadata(Parser parser,
Metadata metadata,
ParseContext context,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders) |
static void |
TikaResource.logRequest(org.slf4j.Logger logger,
javax.ws.rs.core.UriInfo info,
Metadata metadata) |
static void |
UnpackerResource.metadataToCsv(Metadata metadata,
OutputStream outputStream) |
static void |
TikaResource.parse(Parser parser,
org.slf4j.Logger logger,
String path,
InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext)
Use this to call a parser and unify exception handling.
|
Modifier and Type | Method and Description |
---|---|
long |
TextMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
XMPMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
CSVMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
JSONMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
void |
TextMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
XMPMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
CSVMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
JSONMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
Modifier and Type | Class and Description |
---|---|
class |
XMPMetadata
Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the
Metadata API for clients to ease transition.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPMetadata.process(Metadata meta) |
void |
XMPMetadata.process(Metadata meta,
String mimetype)
Converts the Metadata information to XMP.
|
Constructor and Description |
---|
XMPMetadata(Metadata meta) |
XMPMetadata(Metadata meta,
String mimetype)
Initializes the data by converting the Metadata information to XMP.
|
Modifier and Type | Method and Description |
---|---|
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata) |
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata,
String mimetype)
Convert the given Tika metadata map to XMP object.
|
com.adobe.xmp.XMPMeta |
MSOfficeBinaryConverter.process(Metadata metadata) |
abstract com.adobe.xmp.XMPMeta |
AbstractConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
MSOfficeXMLConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
ITikaToXMPConverter.process(Metadata metadata)
Converts a Tika
Metadata -object into an XMPMeta containing the useful
properties. |
com.adobe.xmp.XMPMeta |
RTFConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
OpenDocumentConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
GenericConverter.process(Metadata metadata) |
void |
AbstractConverter.setMetadata(Metadata metadata) |
Copyright © 2007–2017 The Apache Software Foundation. All rights reserved.