Modifier and Type | Method and Description |
---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
Tika.parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
FileResource.getMetadata()
This gets the metadata available before the parsing of the file.
|
Modifier and Type | Method and Description |
---|---|
OutputStream |
OutputStreamFactory.getOutputStream(Metadata metadata) |
protected void |
FileResourceConsumer.parse(String resourceId,
Parser parser,
InputStream is,
ContentHandler handler,
Metadata m,
ParseContext parseContext)
Utility method to handle logging equivalently among all
implementing classes.
|
protected boolean |
FileResourceCrawler.select(Metadata m) |
Modifier and Type | Method and Description |
---|---|
Metadata |
FSFileResource.getMetadata() |
Modifier and Type | Method and Description |
---|---|
OutputStream |
FSOutputStreamFactory.getOutputStream(Metadata metadata)
This tries to create a file based on the
FSUtil.HANDLE_EXISTING
value that was passed in during initialization. |
boolean |
FSDocumentSelector.select(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
ZeroSizeFileDetector.detect(InputStream stream,
Metadata metadata) |
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
MediaType |
TrainedModelDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
OverrideDetector.detect(InputStream input,
Metadata metadata) |
Charset |
NonDetectingEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
Charset |
CompositeEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
Constructor and Description |
---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
Modifier and Type | Method and Description |
---|---|
List<RecognisedObject> |
DL4JVGG16Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
DL4JInceptionV3Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
Modifier and Type | Method and Description |
---|---|
protected static ContentTags |
AbstractProfiler.getContent(org.apache.tika.eval.EvalFilePaths evalFilePaths,
Metadata metadata) |
protected org.apache.tika.eval.EvalFilePaths |
AbstractProfiler.getPathsFromExtractCrawl(Metadata metadata,
Path extracts) |
protected org.apache.tika.eval.EvalFilePaths |
AbstractProfiler.getPathsFromSrcCrawl(Metadata metadata,
Path srcDir,
Path extracts) |
protected void |
AbstractProfiler.writeExceptionData(String fileId,
Metadata m,
TableInfo exceptionTable) |
protected void |
AbstractProfiler.writeProfileData(org.apache.tika.eval.EvalFilePaths fps,
int i,
ContentTags contentTags,
Metadata m,
String fileId,
String containerId,
List<Integer> numAttachments,
TableInfo profileTable) |
Modifier and Type | Method and Description |
---|---|
protected long |
AbstractProfiler.getSourceFileLength(org.apache.tika.eval.EvalFilePaths fps,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ExtractReader.loadExtract(Path extractFile) |
Modifier and Type | Method and Description |
---|---|
static Metadata |
DisplayMetInstance.getMet(URL url) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ParsingExample.recursiveParserWrapperExample()
For documents that may contain embedded documents, it might be helpful
to create list of metadata objects, one for the container document and
one for each embedded document.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
EncryptedPrescriptionDetector.detect(InputStream stream,
Metadata metadata) |
protected ContentHandler |
PrescriptionParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata) |
void |
LanguageDetectingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EncryptedPrescriptionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static String |
MyFirstTika.parseUsingAutoDetect(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
static String |
MyFirstTika.parseUsingComponents(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
String |
EmbeddedDocumentUtil.getExtension(TikaInputStream is,
Metadata metadata) |
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentUtil.parseEmbedded(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
static void |
EmbeddedDocumentUtil.recordEmbeddedStreamException(Throwable t,
Metadata m) |
static void |
EmbeddedDocumentUtil.recordException(Throwable t,
Metadata m) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentUtil.shouldParseEmbedded(Metadata m) |
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
Modifier and Type | Method and Description |
---|---|
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
Modifier and Type | Method and Description |
---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
Modifier and Type | Method and Description |
---|---|
Metadata |
JsonMetadataDeserializer.deserialize(com.google.gson.JsonElement element,
Type type,
com.google.gson.JsonDeserializationContext context)
Deserializes a json object (equivalent to: Map
|
static Metadata |
JsonMetadata.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
static List<Metadata> |
JsonMetadataList.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
void |
JsonStreamingSerializer.add(Metadata metadata) |
protected String[] |
JsonMetadataSerializer.getNames(Metadata metadata)
Override to get a custom sort order
or to filter names.
|
com.google.gson.JsonElement |
JsonMetadataSerializer.serialize(Metadata metadata,
Type type,
com.google.gson.JsonSerializationContext context)
Serializes a Metadata object into effectively Map
|
static void |
JsonMetadata.toJson(Metadata metadata,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
static void |
JsonMetadataList.toJson(List<Metadata> metadataList,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
ProbabilisticMimeDetectionSelector.detect(InputStream input,
Metadata metadata) |
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
RecursiveParserWrapper.getMetadata()
Deprecated.
use a
RecursiveParserWrapperHandler instead |
Modifier and Type | Method and Description |
---|---|
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
void |
AppleSingleFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ClassParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MidiParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AudioParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTCaptioner.getApiUri(Metadata metadata) |
List<CaptionObject> |
TensorflowRESTCaptioner.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ChmParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SourceCodeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Pkcs7Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
CTAKESContentHandler.getMetadata()
Returns metadata that includes cTAKES annotations.
|
Modifier and Type | Method and Description |
---|---|
void |
CTAKESParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
CTAKESContentHandler(ContentHandler handler,
Metadata metadata)
Creates a new
CTAKESContentHandler for the given ContentHandler and Metadata objects. |
CTAKESContentHandler(ContentHandler handler,
Metadata metadata,
CTAKESConfig config)
Creates a new
CTAKESContentHandler for the given ContentHandler and Metadata objects. |
Modifier and Type | Method and Description |
---|---|
void |
DBFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
DIFParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
DWGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EnviHeaderParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EpubParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExecutableParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ExecutableParser.parseELF(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a Unix ELF file
|
void |
ExecutableParser.parsePE(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a DOS or Windows PE file
|
Modifier and Type | Method and Description |
---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
void |
FeedParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TrueTypeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AdobeFontMetricParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GDALParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeographicInformationParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
GribParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
HDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected void |
HDFParser.unravelStringMet(ucar.nc2.NetcdfFile ncFile,
ucar.nc2.Group group,
Metadata met) |
Modifier and Type | Method and Description |
---|---|
Charset |
HtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
HtmlParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Charset |
StandardHtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
WebPParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TiffParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ImageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ICNSParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
BPGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ImageMetadataExtractor(Metadata metadata) |
ImageMetadataExtractor(Metadata metadata,
org.apache.tika.parser.image.ImageMetadataExtractor.DirectoryHandler... handlers) |
Modifier and Type | Method and Description |
---|---|
static void |
JempboxExtractor.extractDublinCore(org.apache.jempbox.xmp.XMPMetadata xmpMetadata,
Metadata metadata)
Tries to extract Dublin Core schema from XMP.
|
static void |
JempboxExtractor.extractXMPMM(org.apache.jempbox.xmp.XMPMetadata xmp,
Metadata metadata)
Extracts Media Management metadata from XMP.
|
Constructor and Description |
---|
JempboxExtractor(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
This method will be removed in Apache Tika 1.0.
|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ISArchiveParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseAssay(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context,
String studyFileName) |
static void |
ISATabUtils.parseStudy(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWorkPackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWork13PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SQLite3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
TEIDOMParser.parse(String source,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
JournalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
GrobidRESTParser.parse(String filePath,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
JpegParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
MailUtil.addPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
void |
RFC822Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
MailUtil.setPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
Modifier and Type | Method and Description |
---|---|
void |
MatParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Map<Integer,Metadata> |
MboxParser.getTrackingMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
OutlookPSTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MboxParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
OutlookExtractor.addEvenIfNull(Property property,
String value,
Metadata metadata) |
static void |
SummaryExtractor.addMulti(Metadata metadata,
Property property,
String string) |
MediaType |
POIFSContainerDetector.detect(InputStream input,
Metadata metadata) |
protected void |
OfficeParser.parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml) |
void |
WMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TNEFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OldExcelParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OfficeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
MSOwnerFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts owner from MS temp file
|
void |
JackcessParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OutlookExtractor.parse(XHTMLContentHandler xhtml,
Metadata metadata) |
Constructor and Description |
---|
ExcelExtractor(ParseContext context,
Metadata metadata) |
HSLFExtractor(ParseContext context,
Metadata metadata) |
SummaryExtractor(Metadata metadata) |
WordExtractor(ParseContext context,
Metadata metadata) |
Modifier and Type | Field and Description |
---|---|
protected Metadata |
XSSFExcelExtractorDecorator.metadata |
Modifier and Type | Method and Description |
---|---|
void |
MetadataExtractor.extract(Metadata metadata) |
void |
XSSFExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XSSFBExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
void |
AbstractOOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected Map<String,String> |
AbstractOOXMLExtractor.loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart,
boolean includeInternal,
Metadata metadata)
This is used by the SAX docx and pptx decorators to load hyperlinks and
other linked objects
|
void |
OOXMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
OOXMLExtractorFactory.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
SXSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
XSLFEventBasedPowerPointExtractor extractor) |
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
XSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xslf.extractor.XSLFPowerPointExtractor extractor) |
XWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
void |
Word2006MLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
WordMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
SpreadsheetMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
AbstractXML2003Parser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
AbstractXML2003Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WordMLParser.setContentType(Metadata metadata) |
void |
SpreadsheetMLParser.setContentType(Metadata metadata) |
protected abstract void |
AbstractXML2003Parser.setContentType(Metadata contentType) |
Modifier and Type | Method and Description |
---|---|
void |
Mp3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MP4Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
NamedEntityParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
NetCDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TesseractOCRParser.parse(Image image,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TesseractOCRParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
OpenDocumentMetaParser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
OpenDocumentParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentMetaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AccessChecker.check(Metadata metadata)
Checks to see if a document's content should be extracted based
on metadata values and the value of
AccessChecker.allowAccessibility in the constructor. |
void |
PDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected static Metadata |
PackageParser.handleEntryMetadata(String name,
Date createAt,
Date modifiedAt,
Long size,
XHTMLContentHandler xhtml) |
Modifier and Type | Method and Description |
---|---|
boolean |
CompressorParserOptions.decompressConcatenated(Metadata metadata) |
MediaType |
ZipContainerDetector.detect(InputStream input,
Metadata metadata) |
void |
RarParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompressorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
PooledTimeSeriesParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
Modifier and Type | Method and Description |
---|---|
void |
PRTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AgeRecogniser.parse(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ObjectRecognitionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<? extends RecognisedObject> |
ObjectRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Recognise the objects in the stream
|
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTVideoRecogniser.getApiUri(Metadata metadata) |
protected URI |
TensorflowRESTRecogniser.getApiUri(Metadata metadata) |
List<RecognisedObject> |
TensorflowRESTRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
TensorflowImageRecParser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SAS7BDATParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SentimentAnalysisParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Performs the parse
|
Modifier and Type | Method and Description |
---|---|
void |
StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Latin1StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Charset |
UniversalEncodingDetector.detect(InputStream input,
Metadata metadata) |
Charset |
Icu4jEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
TXTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
FLVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
WordPerfectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
QuattroProParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
XMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
FictionBookParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
DcXMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AttributeDependantMetadataHandler(Metadata metadata,
String nameHoldingAttribute,
String namePrefix) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
Property property) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
String name) |
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty)
Constructor for Property metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for Property metadata keys which allows change of behavior
for duplicate and empty entry values.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name)
Constructor for string metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for string metadata keys which allows change of behavior
for duplicate and empty entry values.
|
MetadataHandler(Metadata metadata,
Property property)
Deprecated.
|
MetadataHandler(Metadata metadata,
String name)
Deprecated.
|
Modifier and Type | Field and Description |
---|---|
protected List<Metadata> |
RecursiveParserWrapperHandler.metadataList |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
RecursiveParserWrapperHandler.getMetadataList() |
Modifier and Type | Method and Description |
---|---|
void |
RecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
RecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
void |
AbstractRecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
void |
XMPContentHandler.metadata(Metadata metadata) |
void |
RecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
void |
AbstractRecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
StandardsExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
MetadataList.getMetadata() |
Constructor and Description |
---|
MetadataList(List<Metadata> metadata) |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillMetadata(Parser parser,
Metadata metadata,
ParseContext context,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders) |
static void |
TikaResource.logRequest(org.slf4j.Logger logger,
javax.ws.rs.core.UriInfo info,
Metadata metadata) |
static void |
UnpackerResource.metadataToCsv(Metadata metadata,
OutputStream outputStream) |
static void |
TikaResource.parse(Parser parser,
org.slf4j.Logger logger,
String path,
InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext)
Use this to call a parser and unify exception handling.
|
Modifier and Type | Method and Description |
---|---|
long |
XMPMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
TextMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
JSONMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
CSVMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
void |
XMPMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
TextMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
JSONMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
CSVMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
Modifier and Type | Method and Description |
---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
Modifier and Type | Method and Description |
---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
static void |
ParserUtils.recordParserDetails(Parser parser,
Metadata metadata)
|
static void |
ParserUtils.recordParserFailure(Parser parser,
Throwable failure,
Metadata metadata)
|
Modifier and Type | Class and Description |
---|---|
class |
XMPMetadata
Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the
Metadata API for clients to ease transition.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPMetadata.process(Metadata meta) |
void |
XMPMetadata.process(Metadata meta,
String mimetype)
Converts the Metadata information to XMP.
|
Constructor and Description |
---|
XMPMetadata(Metadata meta) |
XMPMetadata(Metadata meta,
String mimetype)
Initializes the data by converting the Metadata information to XMP.
|
Modifier and Type | Method and Description |
---|---|
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata) |
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata,
String mimetype)
Convert the given Tika metadata map to XMP object.
|
com.adobe.xmp.XMPMeta |
RTFConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
OpenDocumentConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
MSOfficeXMLConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
MSOfficeBinaryConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
ITikaToXMPConverter.process(Metadata metadata)
Converts a Tika
Metadata -object into an XMPMeta containing the useful
properties. |
com.adobe.xmp.XMPMeta |
GenericConverter.process(Metadata metadata) |
abstract com.adobe.xmp.XMPMeta |
AbstractConverter.process(Metadata metadata) |
void |
AbstractConverter.setMetadata(Metadata metadata) |
Copyright © 2007–2018 The Apache Software Foundation. All rights reserved.