Modifier and Type | Method and Description |
---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
Tika.parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
Modifier and Type | Method and Description |
---|---|
Metadata |
FileResource.getMetadata()
This gets the metadata available before the parsing of the file.
|
Modifier and Type | Method and Description |
---|---|
OutputStream |
OutputStreamFactory.getOutputStream(Metadata metadata) |
protected void |
FileResourceConsumer.parse(String resourceId,
Parser parser,
InputStream is,
ContentHandler handler,
Metadata m,
ParseContext parseContext)
Utility method to handle logging equivalently among all
implementing classes.
|
protected boolean |
FileResourceCrawler.select(Metadata m) |
Modifier and Type | Method and Description |
---|---|
Metadata |
FSFileResource.getMetadata() |
Modifier and Type | Method and Description |
---|---|
OutputStream |
FSOutputStreamFactory.getOutputStream(Metadata metadata)
This tries to create a file based on the
FSUtil.HANDLE_EXISTING
value that was passed in during initialization. |
boolean |
FSDocumentSelector.select(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
OverrideDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
FileCommandDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
ZeroSizeFileDetector.detect(InputStream stream,
Metadata metadata) |
MediaType |
TrainedModelDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
Charset |
CompositeEncodingDetector.detect(InputStream input,
Metadata metadata) |
Charset |
NonDetectingEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
Constructor and Description |
---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
Modifier and Type | Method and Description |
---|---|
MediaType |
BPListDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
POIFSContainerDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
MiscOLEDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
MediaType |
StreamingZipContainerDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
DeprecatedStreamingZipContainerDetector.detect(InputStream is,
Metadata metadata) |
MediaType |
DefaultZipContainerDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
List<RecognisedObject> |
DL4JVGG16Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
DL4JInceptionV3Net.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
Modifier and Type | Method and Description |
---|---|
protected static ContentTags |
AbstractProfiler.getContent(org.apache.tika.eval.app.EvalFilePaths evalFilePaths,
Metadata metadata) |
protected org.apache.tika.eval.app.EvalFilePaths |
AbstractProfiler.getPathsFromExtractCrawl(Metadata metadata,
Path extracts) |
protected org.apache.tika.eval.app.EvalFilePaths |
AbstractProfiler.getPathsFromSrcCrawl(Metadata metadata,
Path srcDir,
Path extracts) |
protected void |
AbstractProfiler.writeExceptionData(String fileId,
Metadata m,
TableInfo exceptionTable) |
protected void |
AbstractProfiler.writeProfileData(org.apache.tika.eval.app.EvalFilePaths fps,
int i,
ContentTags contentTags,
Metadata m,
String fileId,
String containerId,
List<Integer> numAttachments,
TableInfo profileTable) |
Modifier and Type | Method and Description |
---|---|
protected long |
AbstractProfiler.getSourceFileLength(org.apache.tika.eval.app.EvalFilePaths fps,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ExtractReader.loadExtract(Path extractFile) |
Modifier and Type | Method and Description |
---|---|
void |
TikaEvalMetadataFilter.filter(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
static Metadata |
DisplayMetInstance.getMet(URL url) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ParsingExample.recursiveParserWrapperExample()
For documents that may contain embedded documents, it might be helpful
to create list of metadata objects, one for the container document and
one for each embedded document.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
EncryptedPrescriptionDetector.detect(InputStream stream,
Metadata metadata) |
protected ContentHandler |
PrescriptionParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PickBestTextEncodingParser.parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
|
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata) |
void |
PickBestTextEncodingParser.parse(InputStream stream,
ContentHandler handler,
Metadata originalMetadata,
ParseContext context)
Deprecated.
|
void |
EncryptedPrescriptionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DirListParser.parse(InputStream is,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
LanguageDetectingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected boolean |
PickBestTextEncodingParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Deprecated.
|
protected void |
PickBestTextEncodingParser.parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Deprecated.
|
static String |
MyFirstTika.parseUsingAutoDetect(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
static String |
MyFirstTika.parseUsingComponents(String filename,
TikaConfig tikaConfig,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
String |
EmbeddedDocumentUtil.getExtension(TikaInputStream is,
Metadata metadata) |
EmbeddedDocumentExtractor |
EmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
EmbeddedDocumentExtractor |
ParsingEmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
void |
EmbeddedDocumentUtil.parseEmbedded(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
static void |
EmbeddedDocumentUtil.recordEmbeddedStreamException(Throwable t,
Metadata m) |
static void |
EmbeddedDocumentUtil.recordException(Throwable t,
Metadata m) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentUtil.shouldParseEmbedded(Metadata m) |
boolean |
EmbeddedStreamTranslator.shouldTranslate(InputStream inputStream,
Metadata metadata) |
boolean |
DefaultEmbeddedStreamTranslator.shouldTranslate(InputStream inputStream,
Metadata metadata)
This should sniff the stream to determine if it needs to be translated.
|
InputStream |
EmbeddedStreamTranslator.translate(InputStream inputStream,
Metadata metadata) |
InputStream |
DefaultEmbeddedStreamTranslator.translate(InputStream inputStream,
Metadata metadata)
This will consume the InputStream and return a new stream of translated bytes.
|
Modifier and Type | Method and Description |
---|---|
boolean |
MSEmbeddedStreamTranslator.shouldTranslate(InputStream inputStream,
Metadata metadata) |
InputStream |
MSEmbeddedStreamTranslator.translate(InputStream inputStream,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
Modifier and Type | Method and Description |
---|---|
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata,
TemporaryResources tmp) |
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
Modifier and Type | Method and Description |
---|---|
void |
OpenNLPMetadataFilter.filter(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
OptimaizeMetadataFilter.filter(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
Modifier and Type | Method and Description |
---|---|
void |
DateNormalizingMetadataFilter.filter(Metadata metadata) |
void |
CompositeMetadataFilter.filter(Metadata metadata) |
void |
ExcludeFieldMetadataFilter.filter(Metadata metadata) |
void |
ClearByMimeMetadataFilter.filter(Metadata metadata) |
void |
IncludeFieldMetadataFilter.filter(Metadata metadata) |
abstract void |
MetadataFilter.filter(Metadata metadata) |
void |
NoOpFilter.filter(Metadata metadata) |
void |
FieldNameMappingFilter.filter(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
Metadata |
JsonMetadataDeserializer.deserialize(com.fasterxml.jackson.core.JsonParser jsonParser,
com.fasterxml.jackson.databind.DeserializationContext deserializationContext) |
static Metadata |
JsonMetadata.fromJson(Reader reader)
Read metadata from reader.
|
static Metadata |
JsonMetadata.readMetadataObject(com.fasterxml.jackson.core.JsonParser jParser)
expects that jParser has not yet started on object or
for jParser to be pointing to the start object.
|
Modifier and Type | Method and Description |
---|---|
static List<Metadata> |
JsonMetadataList.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
void |
JsonStreamingSerializer.add(Metadata metadata) |
void |
JsonMetadataSerializer.serialize(Metadata metadata,
com.fasterxml.jackson.core.JsonGenerator jsonGenerator,
com.fasterxml.jackson.databind.SerializerProvider serializerProvider) |
static void |
JsonMetadata.toJson(Metadata metadata,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
static void |
JsonMetadataList.toJson(List<Metadata> metadataList,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
ProbabilisticMimeDetectionSelector.detect(InputStream input,
Metadata metadata) |
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
ParseRecord.getMetadataList() |
Modifier and Type | Method and Description |
---|---|
void |
ParseRecord.addMetadata(Metadata metadata) |
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler,
Metadata, ParseContext) method instead |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
RegexCaptureParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context) |
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
void |
PListParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AppleSingleFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ClassParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MidiParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AudioParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTCaptioner.getApiUri(Metadata metadata) |
List<CaptionObject> |
TensorflowRESTCaptioner.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SourceCodeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Pkcs7Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TextAndCSVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
CTAKESContentHandler.getMetadata()
Returns metadata that includes cTAKES annotations.
|
Modifier and Type | Method and Description |
---|---|
void |
CTAKESParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
CTAKESContentHandler(ContentHandler handler,
Metadata metadata)
Creates a new
CTAKESContentHandler for the given ContentHandler and Metadata objects. |
CTAKESContentHandler(ContentHandler handler,
Metadata metadata,
CTAKESConfig config)
Creates a new
CTAKESContentHandler for the given ContentHandler
and Metadata objects. |
Modifier and Type | Method and Description |
---|---|
void |
DBFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
DGN8Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
DIFParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
DWGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EnviHeaderParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EpubParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExecutableParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ExecutableParser.parseELF(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a Unix ELF file
|
void |
ExecutableParser.parsePE(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a DOS or Windows PE file
|
Modifier and Type | Method and Description |
---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
FeedParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AdobeFontMetricParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TrueTypeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GDALParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
GeographicInformationParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
GribParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
HDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected void |
HDFParser.unravelStringMet(ucar.nc2.NetcdfFile ncFile,
ucar.nc2.Group group,
Metadata met) |
Modifier and Type | Method and Description |
---|---|
Charset |
HtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
HtmlParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Charset |
StandardHtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
HttpParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
HwpTextExtractorV5.extract(InputStream source,
Metadata metadata,
XHTMLContentHandler xhtml)
extract Text from HWP Stream.
|
void |
HwpV5Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
JXLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WebPParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ICNSParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AbstractImageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ImageMetadataExtractor(Metadata metadata) |
ImageMetadataExtractor(Metadata metadata,
org.apache.tika.parser.image.ImageMetadataExtractor.DirectoryHandler... handlers) |
Modifier and Type | Method and Description |
---|---|
void |
IDMLParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
This method will be removed in Apache Tika 1.0.
|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ISArchiveParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseAssay(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
ISATabUtils.parseInvestigation(InputStream stream,
XHTMLContentHandler handler,
Metadata metadata,
ParseContext context,
String studyFileName) |
static void |
ISATabUtils.parseStudy(InputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWorkPackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWork13PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
IWork18PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected Blob |
JDBCTableReader.getBlob(ResultSet resultSet,
int columnIndex,
Metadata metadata) |
protected Connection |
AbstractDBParser.getConnection(InputStream stream,
Metadata metadata,
ParseContext context)
Override this for special configuration of the connection, such as limiting
the number of rows to be held in memory.
|
protected abstract String |
AbstractDBParser.getConnectionString(InputStream stream,
Metadata metadata,
ParseContext parseContext)
Implement for db specific connection information, e.g.
|
protected abstract List<String> |
AbstractDBParser.getTableNames(Connection connection,
Metadata metadata,
ParseContext context)
Returns the names of the tables to process
|
void |
AbstractDBParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
TEIDOMParser.parse(String source,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
JournalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
GrobidRESTParser.parse(String filePath,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RFC822Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
MailUtil.addPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
static void |
MailUtil.setPersonAndEmail(String string,
Property personProperty,
Property emailProperty,
Metadata metadata)
This tries to split a "from" or "to" value into a person field and an email field.
|
Modifier and Type | Method and Description |
---|---|
void |
MatParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Map<Integer,Metadata> |
MboxParser.getTrackingMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
MboxParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
OutlookExtractor.addEvenIfNull(Property property,
String value,
Metadata metadata) |
static void |
SummaryExtractor.addMulti(Metadata metadata,
Property property,
String string) |
protected void |
OfficeParser.parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml) |
void |
JackcessParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OldExcelParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
TNEFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
WMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MSOwnerFileParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts owner from MS temp file
|
void |
OfficeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
EMFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OutlookExtractor.parse(XHTMLContentHandler xhtml,
Metadata metadata)
Deprecated.
use
#parse(XHTMLContentHandler), will be removed after 2.4.0 |
Constructor and Description |
---|
ExcelExtractor(ParseContext context,
Metadata metadata) |
HSLFExtractor(ParseContext context,
Metadata metadata) |
OutlookExtractor(org.apache.poi.poifs.filesystem.DirectoryNode root,
Metadata metadata,
ParseContext context) |
SummaryExtractor(Metadata metadata) |
WordExtractor(ParseContext context,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
ChmParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
OneNoteParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MSOneStorePackage.walkTree(OneNoteTreeWalkerOptions options,
Metadata metadata,
XHTMLContentHandler xhtml) |
Modifier and Type | Field and Description |
---|---|
protected Metadata |
XSSFExcelExtractorDecorator.metadata |
Modifier and Type | Method and Description |
---|---|
void |
MetadataExtractor.extract(Metadata metadata) |
void |
OOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
void |
AbstractOOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XSSFBExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XSSFExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected Map<String,String> |
AbstractOOXMLExtractor.loadLinkedRelationships(org.apache.poi.openxml4j.opc.PackagePart bodyPart,
boolean includeInternal,
Metadata metadata)
This is used by the SAX docx and pptx decorators to load hyperlinks and
other linked objects
|
static void |
OOXMLExtractorFactory.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
OOXMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
SXSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
XSLFEventBasedPowerPointExtractor extractor) |
SXWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
XWPFEventBasedWordExtractor extractor) |
XSLFPowerPointExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xslf.extractor.XSLFExtractor extractor) |
XWPFWordExtractorDecorator(Metadata metadata,
ParseContext context,
org.apache.poi.xwpf.extractor.XWPFWordExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
void |
Word2006MLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
OutlookPSTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
SpreadsheetMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
WordMLParser.getContentHandler(ContentHandler ch,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
AbstractXML2003Parser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
AbstractXML2003Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
SpreadsheetMLParser.setContentType(Metadata metadata) |
void |
WordMLParser.setContentType(Metadata metadata) |
protected abstract void |
AbstractXML2003Parser.setContentType(Metadata contentType) |
Modifier and Type | Method and Description |
---|---|
ContentHandler |
MIFParser.getContentHandler(ContentHandler handler,
Metadata metadata)
Get the content handler to use.
|
void |
MIFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Mp3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MP4Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
TikaMp4BoxHandler(com.drew.metadata.Metadata metadata,
Metadata tikaMetadata,
XHTMLContentHandler xhtml) |
Constructor and Description |
---|
TikaUserDataBox(String box,
byte[] payload,
Metadata metadata,
XHTMLContentHandler xhtml) |
Modifier and Type | Method and Description |
---|---|
protected static Metadata |
AbstractMultipleParser.mergeMetadata(Metadata newMetadata,
Metadata lastMetadata,
AbstractMultipleParser.MetadataPolicy policy) |
Modifier and Type | Method and Description |
---|---|
protected static Metadata |
AbstractMultipleParser.mergeMetadata(Metadata newMetadata,
Metadata lastMetadata,
AbstractMultipleParser.MetadataPolicy policy) |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
The
ContentHandlerFactory override is still experimental
and the method signature is subject to change before Tika 2.0 |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
protected boolean |
FallbackParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected abstract boolean |
AbstractMultipleParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Used to notify implementations that a Parser has Finished
or Failed, and to allow them to decide to continue or
abort further parsing
|
protected boolean |
SupplementingParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected void |
AbstractMultipleParser.parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Used to allow implementations to prepare or change things
before parsing occurs
|
Modifier and Type | Method and Description |
---|---|
void |
NamedEntityParser.parse(InputStream inputStream,
ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
void |
NetCDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TesseractOCRParser.parse(Image image,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
TesseractOCRParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
OpenDocumentMetaParser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
OpenDocumentContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
FlatOpenDocumentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentMetaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AccessChecker.check(Metadata metadata)
Checks to see if a document's content should be extracted based
on metadata values and the value of
AccessChecker.allowAccessibility in the constructor. |
static void |
PDMetadataExtractor.extract(org.apache.pdfbox.pdmodel.common.PDMetadata pdMetadata,
Metadata metadata,
ParseContext context) |
protected org.apache.pdfbox.pdmodel.PDDocument |
PDFParser.getPDDocument(InputStream inputStream,
String password,
org.apache.pdfbox.io.MemoryUsageSetting memoryUsageSetting,
Metadata metadata,
ParseContext parseContext) |
protected org.apache.pdfbox.pdmodel.PDDocument |
PDFParser.getPDDocument(Path path,
String password,
org.apache.pdfbox.io.MemoryUsageSetting memoryUsageSetting,
Metadata metadata,
ParseContext parseContext) |
void |
PDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
PDFMarkedContent2XHTML.process(org.apache.pdfbox.pdmodel.PDDocument pdDocument,
ContentHandler handler,
ParseContext context,
Metadata metadata,
PDFParserConfig config)
Converts the given PDF document (and related metadata) to a stream
of XHTML SAX events sent to the given content handler.
|
Modifier and Type | Field and Description |
---|---|
protected Metadata |
ImageGraphicsEngine.parentMetadata |
Modifier and Type | Method and Description |
---|---|
protected void |
ImageGraphicsEngine.extractInlineImageMetadataOnly(org.apache.pdfbox.pdmodel.graphics.image.PDImage pdImage,
Metadata metadata) |
protected String |
ImageGraphicsEngine.getSuffix(org.apache.pdfbox.pdmodel.graphics.image.PDImage pdImage,
Metadata metadata) |
ImageGraphicsEngine |
ImageGraphicsEngineFactory.newEngine(org.apache.pdfbox.pdmodel.PDPage page,
int pageNumber,
EmbeddedDocumentExtractor embeddedDocumentExtractor,
PDFParserConfig pdfParserConfig,
Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages,
AtomicInteger imageCounter,
XHTMLContentHandler xhtml,
Metadata parentMetadata,
ParseContext parseContext) |
Constructor and Description |
---|
ImageGraphicsEngine(org.apache.pdfbox.pdmodel.PDPage page,
int pageNumber,
EmbeddedDocumentExtractor embeddedDocumentExtractor,
PDFParserConfig pdfParserConfig,
Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages,
AtomicInteger imageCounter,
XHTMLContentHandler xhtml,
Metadata parentMetadata,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
protected static Metadata |
PackageParser.handleEntryMetadata(String name,
Date createAt,
Date modifiedAt,
Long size,
XHTMLContentHandler xhtml) |
Modifier and Type | Method and Description |
---|---|
boolean |
CompressorParserOptions.decompressConcatenated(Metadata metadata) |
void |
RarParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompressorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
PooledTimeSeriesParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
Modifier and Type | Method and Description |
---|---|
void |
PRTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ObjectRecognitionParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AgeRecogniser.parse(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<? extends RecognisedObject> |
ObjectRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Recognise the objects in the stream
|
Modifier and Type | Method and Description |
---|---|
protected URI |
TensorflowRESTVideoRecogniser.getApiUri(Metadata metadata) |
protected URI |
TensorflowRESTRecogniser.getApiUri(Metadata metadata) |
List<RecognisedObject> |
TensorflowImageRecParser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
List<RecognisedObject> |
TensorflowRESTRecogniser.recognise(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SAS7BDATParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SentimentAnalysisParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Performs the parse
|
Modifier and Type | Method and Description |
---|---|
void |
SQLite3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Latin1StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
StringsParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TMXParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
AmazonTranscribe.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Starts AWS Transcribe Job with language specification.
|
Modifier and Type | Method and Description |
---|---|
Charset |
UniversalEncodingDetector.detect(InputStream input,
Metadata metadata) |
Charset |
Icu4jEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
TXTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
FLVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
WACZParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
WARCParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
QuattroProParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
WordPerfectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
XLZParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
XLIFF12Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
DcXMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
XMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
FictionBookParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
TextAndAttributeXMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XMLProfiler.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AttributeDependantMetadataHandler(Metadata metadata,
String nameHoldingAttribute,
String namePrefix) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
Property property) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
String name) |
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty)
Constructor for Property metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for Property metadata keys which allows change of behavior
for duplicate and empty entry values.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name)
Constructor for string metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for string metadata keys which allows change of behavior
for duplicate and empty entry values.
|
MetadataHandler(Metadata metadata,
Property property)
Deprecated.
|
MetadataHandler(Metadata metadata,
String name)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
static void |
JempboxExtractor.extractDublinCore(org.apache.jempbox.xmp.XMPMetadata xmpMetadata,
Metadata metadata)
Tries to extract Dublin Core schema from XMP.
|
static void |
XMPMetadataExtractor.extractDublinCoreSchema(org.apache.xmpbox.XMPMetadata xmp,
Metadata metadata)
Extracts Dublin Core.
|
static void |
XMPMetadataExtractor.extractXMPBasicSchema(org.apache.xmpbox.XMPMetadata xmp,
Metadata metadata)
Extracts basic schema metadata from XMP.
|
static void |
JempboxExtractor.extractXMPMM(org.apache.jempbox.xmp.XMPMetadata xmp,
Metadata metadata)
Extracts Media Management metadata from XMP.
|
static void |
XMPMetadataExtractor.parse(InputStream stream,
Metadata metadata)
Parse the XMP Packets.
|
Constructor and Description |
---|
JempboxExtractor(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
Metadata |
FetchEmitTuple.getMetadata() |
Constructor and Description |
---|
FetchEmitTuple(String id,
FetchKey fetchKey,
EmitKey emitKey,
Metadata metadata) |
FetchEmitTuple(String id,
FetchKey fetchKey,
EmitKey emitKey,
Metadata metadata,
HandlerConfig handlerConfig,
FetchEmitTuple.ON_PARSE_EXCEPTION onParseException) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
EmitData.getMetadataList() |
Modifier and Type | Method and Description |
---|---|
void |
StreamEmitter.emit(String emitKey,
InputStream inputStream,
Metadata userMetadata) |
Modifier and Type | Method and Description |
---|---|
void |
Emitter.emit(String emitKey,
List<Metadata> metadataList) |
void |
EmptyEmitter.emit(String emitKey,
List<Metadata> metadataList) |
Constructor and Description |
---|
EmitData(EmitKey emitKey,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
void |
AZBlobEmitter.emit(String path,
InputStream is,
Metadata userMetadata) |
Modifier and Type | Method and Description |
---|---|
void |
AZBlobEmitter.emit(String emitKey,
List<Metadata> metadataList)
Requires the src-bucket/path/to/my/file.txt in the
TikaCoreProperties.SOURCE_PATH . |
Modifier and Type | Method and Description |
---|---|
void |
FileSystemEmitter.emit(String path,
InputStream inputStream,
Metadata userMetadata) |
Modifier and Type | Method and Description |
---|---|
void |
FileSystemEmitter.emit(String emitKey,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
void |
GCSEmitter.emit(String path,
InputStream is,
Metadata userMetadata) |
Modifier and Type | Method and Description |
---|---|
void |
GCSEmitter.emit(String emitKey,
List<Metadata> metadataList)
Requires the src-bucket/path/to/my/file.txt in the
TikaCoreProperties.SOURCE_PATH . |
Modifier and Type | Method and Description |
---|---|
protected static String |
OpenSearchClient.metadataToJsonContainer(Metadata metadata,
OpenSearchEmitter.AttachmentStrategy attachmentStrategy) |
protected static String |
OpenSearchClient.metadataToJsonEmbedded(Metadata metadata,
OpenSearchEmitter.AttachmentStrategy attachmentStrategy,
String emitKey,
String embeddedFileFieldName) |
Modifier and Type | Method and Description |
---|---|
void |
OpenSearchEmitter.emit(String emitKey,
List<Metadata> metadataList) |
void |
OpenSearchClient.emitDocument(String emitKey,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
void |
S3Emitter.emit(String path,
InputStream is,
Metadata userMetadata) |
Modifier and Type | Method and Description |
---|---|
void |
S3Emitter.emit(String emitKey,
List<Metadata> metadataList)
Requires the src-bucket/path/to/my/file.txt in the
TikaCoreProperties.SOURCE_PATH . |
Modifier and Type | Method and Description |
---|---|
void |
SolrEmitter.emit(String emitKey,
List<Metadata> metadataList) |
Modifier and Type | Method and Description |
---|---|
InputStream |
RangeFetcher.fetch(String fetchKey,
long startOffset,
long endOffset,
Metadata metadata) |
InputStream |
EmptyFetcher.fetch(String fetchKey,
Metadata metadata) |
InputStream |
Fetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
AZBlobFetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
FileSystemFetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
GCSFetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
HttpFetcher.fetch(String fetchKey,
long startRange,
long endRange,
Metadata metadata) |
InputStream |
HttpFetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
S3Fetcher.fetch(String fetchKey,
long startRange,
long endRange,
Metadata metadata) |
InputStream |
S3Fetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
InputStream |
UrlFetcher.fetch(String fetchKey,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
Metadata |
RenderResult.getMetadata() |
Modifier and Type | Method and Description |
---|---|
RenderResults |
Renderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
RenderResults |
CompositeRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
Constructor and Description |
---|
RenderResult(RenderResult.STATUS status,
int id,
Object result,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
RenderResults |
MuPDFRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
Modifier and Type | Method and Description |
---|---|
RenderResults |
PDFBoxRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
protected RenderResult |
PDFBoxRenderer.renderPage(org.apache.pdfbox.rendering.PDFRenderer renderer,
int id,
int pageNumber,
Metadata metadata) |
Modifier and Type | Field and Description |
---|---|
protected List<Metadata> |
RecursiveParserWrapperHandler.metadataList |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
RecursiveParserWrapperHandler.getMetadataList() |
Modifier and Type | Method and Description |
---|---|
ContentHandler |
ContentHandlerDecoratorFactory.decorate(ContentHandler contentHandler,
Metadata metadata)
Deprecated.
use
ContentHandlerDecoratorFactory.decorate(ContentHandler, Metadata, ParseContext)
This will be removed in 2.5.0 |
ContentHandler |
ContentHandlerDecoratorFactory.decorate(ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
void |
AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
RecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
void |
RecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
void |
XMPContentHandler.metadata(Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
void |
RecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
Constructor and Description |
---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
StandardsExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
List<Metadata> |
MetadataList.getMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
CompositeParseContextConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
void |
ParseContextConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> headers,
Metadata metadata,
ParseContext context)
Configures the parseContext with present headers.
|
InputStream |
DefaultInputStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders) |
InputStream |
InputStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders)
|
InputStream |
FetcherStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders) |
InputStream |
DefaultInputStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders,
javax.ws.rs.core.UriInfo uriInfo) |
InputStream |
InputStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders,
javax.ws.rs.core.UriInfo uriInfo) |
InputStream |
FetcherStreamFactory.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders httpHeaders,
javax.ws.rs.core.UriInfo uriInfo) |
Constructor and Description |
---|
MetadataList(List<Metadata> metadata) |
Modifier and Type | Method and Description |
---|---|
void |
PasswordProviderConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
void |
TimeoutConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext context) |
void |
DocumentSelectorConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata mtadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Metadata |
TikaResource.getJson(InputStream is,
javax.ws.rs.core.HttpHeaders httpHeaders,
javax.ws.rs.core.UriInfo info,
String handlerTypeName) |
Metadata |
TikaResource.getJsonFromMultipart(org.apache.cxf.jaxrs.ext.multipart.Attachment att,
javax.ws.rs.core.HttpHeaders httpHeaders,
javax.ws.rs.core.UriInfo info,
String handlerTypeName) |
protected Metadata |
MetadataResource.parseMetadata(InputStream is,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
javax.ws.rs.core.UriInfo info) |
Modifier and Type | Method and Description |
---|---|
static List<Metadata> |
RecursiveMetadataResource.parseMetadata(InputStream is,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
javax.ws.rs.core.UriInfo info,
HandlerConfig handlerConfig) |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillMetadata(Parser parser,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders) |
static void |
TikaResource.fillParseContext(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext) |
static InputStream |
TikaResource.getInputStream(InputStream is,
Metadata metadata,
javax.ws.rs.core.HttpHeaders headers,
javax.ws.rs.core.UriInfo uriInfo) |
static void |
TikaResource.logRequest(org.slf4j.Logger logger,
String endpoint,
Metadata metadata) |
static void |
UnpackerResource.metadataToCsv(Metadata metadata,
OutputStream outputStream) |
static void |
TikaResource.parse(Parser parser,
org.slf4j.Logger logger,
String path,
InputStream inputStream,
ContentHandler handler,
Metadata metadata,
ParseContext parseContext)
Use this to call a parser and unify exception handling.
|
protected Metadata |
MetadataResource.parseMetadata(InputStream is,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
javax.ws.rs.core.UriInfo info) |
static List<Metadata> |
RecursiveMetadataResource.parseMetadata(InputStream is,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
javax.ws.rs.core.UriInfo info,
HandlerConfig handlerConfig) |
javax.ws.rs.core.StreamingOutput |
TikaResource.produceText(InputStream is,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
javax.ws.rs.core.UriInfo info) |
Modifier and Type | Method and Description |
---|---|
long |
JSONMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
JSONObjWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
TextMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
CSVMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
void |
JSONMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
TextMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
CSVMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
Modifier and Type | Method and Description |
---|---|
void |
PDFServerConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext)
Configures the parseContext with present headers.
|
void |
TesseractServerConfig.configure(javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders,
Metadata metadata,
ParseContext parseContext)
Configures the parseContext with present headers.
|
Modifier and Type | Method and Description |
---|---|
long |
XMPMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
void |
XMPMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
Modifier and Type | Method and Description |
---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
Modifier and Type | Method and Description |
---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
static void |
ParserUtils.recordParserDetails(Parser parser,
Metadata metadata)
|
static void |
ParserUtils.recordParserDetails(String parserClassName,
Metadata metadata)
|
static void |
ParserUtils.recordParserFailure(Parser parser,
Throwable failure,
Metadata metadata)
|
Modifier and Type | Class and Description |
---|---|
class |
XMPMetadata
Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the
Metadata API for clients to ease transition.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPMetadata.process(Metadata meta) |
void |
XMPMetadata.process(Metadata meta,
String mimetype)
Converts the Metadata information to XMP.
|
Constructor and Description |
---|
XMPMetadata(Metadata meta) |
XMPMetadata(Metadata meta,
String mimetype)
Initializes the data by converting the Metadata information to XMP.
|
Modifier and Type | Method and Description |
---|---|
static com.adobe.internal.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata) |
static com.adobe.internal.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata,
String mimetype)
Convert the given Tika metadata map to XMP object.
|
com.adobe.internal.xmp.XMPMeta |
GenericConverter.process(Metadata metadata) |
com.adobe.internal.xmp.XMPMeta |
RTFConverter.process(Metadata metadata) |
com.adobe.internal.xmp.XMPMeta |
MSOfficeXMLConverter.process(Metadata metadata) |
com.adobe.internal.xmp.XMPMeta |
MSOfficeBinaryConverter.process(Metadata metadata) |
com.adobe.internal.xmp.XMPMeta |
ITikaToXMPConverter.process(Metadata metadata)
Converts a Tika
Metadata -object into an XMPMeta containing the useful
properties. |
com.adobe.internal.xmp.XMPMeta |
OpenDocumentConverter.process(Metadata metadata) |
abstract com.adobe.internal.xmp.XMPMeta |
AbstractConverter.process(Metadata metadata) |
void |
AbstractConverter.setMetadata(Metadata metadata) |
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.