Modifier and Type | Method and Description |
---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
Constructor and Description |
---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
Modifier and Type | Method and Description |
---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
Modifier and Type | Method and Description |
---|---|
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Creates a TikaInputStream from the given file.
|
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
Modifier and Type | Method and Description |
---|---|
protected String[] |
JsonMetadataSerializer.getNames(Metadata metadata)
Override to get a custom sort order
or to filter names.
|
com.google.gson.JsonElement |
JsonMetadataSerializer.serialize(Metadata metadata,
Type type,
com.google.gson.JsonSerializationContext context) |
Modifier and Type | Method and Description |
---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
Modifier and Type | Method and Description |
---|---|
Metadata |
JsonMetadataDeserializer.deserialize(com.google.gson.JsonElement element,
Type type,
com.google.gson.JsonDeserializationContext context)
Deserializes a json object (equivalent to: Map
|
static Metadata |
JsonMetadata.fromJson(Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
protected String[] |
JsonMetadataSerializer.getNames(Metadata metadata)
Override to get a custom sort order
or to filter names.
|
com.google.gson.JsonElement |
JsonMetadataSerializer.serialize(Metadata metadata,
Type type,
com.google.gson.JsonSerializationContext context)
Serializes a Metadata object into effectively Map
|
static void |
JsonMetadata.toJson(Metadata metadata,
Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
Modifier and Type | Method and Description |
---|---|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
void |
ClassParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MidiParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AudioParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ChmParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
SourceCodeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Pkcs7Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
DWGParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EnviHeaderParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EpubParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
ExecutableParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ExecutableParser.parseELF(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a Unix ELF file
|
void |
ExecutableParser.parsePE(XHTMLContentHandler xhtml,
Metadata metadata,
InputStream stream,
byte[] first4)
Parses a DOS or Windows PE file
|
Modifier and Type | Method and Description |
---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
void |
FeedParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TrueTypeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AdobeFontMetricParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
HDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected void |
HDFParser.unravelStringMet(ucar.nc2.NetcdfFile ncFile,
ucar.nc2.Group group,
Metadata met) |
Modifier and Type | Method and Description |
---|---|
Charset |
HtmlEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
HtmlParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
TiffParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
PSDParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ImageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ImageMetadataExtractor(Metadata metadata) |
ImageMetadataExtractor(Metadata metadata,
org.apache.tika.parser.image.ImageMetadataExtractor.DirectoryHandler... handlers) |
Constructor and Description |
---|
JempboxExtractor(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
This method will be removed in Apache Tika 1.0.
|
void |
IptcAnpaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
IWorkPackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
JpegParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RFC822Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MatParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Map<Integer,Metadata> |
MboxParser.getTrackingMetadata() |
Modifier and Type | Method and Description |
---|---|
void |
OutlookPSTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MboxParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
MediaType |
POIFSContainerDetector.detect(InputStream input,
Metadata metadata) |
protected void |
OfficeParser.parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml) |
void |
TNEFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OfficeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Extracts properties and text from an MS Document input stream
|
void |
OutlookExtractor.parse(XHTMLContentHandler xhtml,
Metadata metadata) |
Constructor and Description |
---|
SummaryExtractor(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
void |
MetadataExtractor.extract(Metadata metadata) |
void |
XSSFExcelExtractorDecorator.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
void |
AbstractOOXMLExtractor.getXHTML(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OOXMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
static void |
OOXMLExtractorFactory.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
Mp3Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
MP4Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
NetCDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
OpenDocumentMetaParser.getContentHandler(ContentHandler ch,
Metadata md,
ParseContext context) |
void |
OpenDocumentParser.parse(InputStream stream,
ContentHandler baseHandler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentMetaParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
OpenDocumentContentParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
PDFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
boolean |
CompressorParserOptions.decompressConcatenated(Metadata metadata) |
MediaType |
ZipContainerDetector.detect(InputStream input,
Metadata metadata) |
void |
PackageParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompressorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
PRTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
RTFParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
Charset |
UniversalEncodingDetector.detect(InputStream input,
Metadata metadata) |
Charset |
Icu4jEncodingDetector.detect(InputStream input,
Metadata metadata) |
void |
TXTParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
FLVParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected ContentHandler |
XMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
FictionBookParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected ContentHandler |
DcXMLParser.getContentHandler(ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
XMLParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AttributeDependantMetadataHandler(Metadata metadata,
String nameHoldingAttribute,
String namePrefix) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
Property property) |
AttributeMetadataHandler(String uri,
String localName,
Metadata metadata,
String name) |
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty)
Constructor for Property metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
Property targetProperty,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for Property metadata keys which allows change of behavior
for duplicate and empty entry values.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name)
Constructor for string metadata keys.
|
ElementMetadataHandler(String uri,
String localName,
Metadata metadata,
String name,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for string metadata keys which allows change of behavior
for duplicate and empty entry values.
|
MetadataHandler(Metadata metadata,
Property property)
Deprecated.
|
MetadataHandler(Metadata metadata,
String name)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPContentHandler.metadata(Metadata metadata) |
Constructor and Description |
---|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
static void |
TikaResource.fillMetadata(AutoDetectParser parser,
Metadata metadata,
javax.ws.rs.core.MultivaluedMap<String,String> httpHeaders) |
long |
JSONMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
long |
CSVMessageBodyWriter.getSize(Metadata data,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType) |
static void |
TikaResource.logRequest(org.apache.commons.logging.Log logger,
javax.ws.rs.core.UriInfo info,
Metadata metadata) |
static void |
MetadataResource.metadataToCsv(Metadata metadata,
OutputStream outputStream) |
void |
JSONMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
void |
CSVMessageBodyWriter.writeTo(Metadata metadata,
Class<?> type,
Type genericType,
Annotation[] annotations,
javax.ws.rs.core.MediaType mediaType,
javax.ws.rs.core.MultivaluedMap<String,Object> httpHeaders,
OutputStream entityStream) |
Modifier and Type | Class and Description |
---|---|
class |
XMPMetadata
Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the
Metadata API for clients to ease transition.
|
Modifier and Type | Method and Description |
---|---|
void |
XMPMetadata.process(Metadata meta) |
void |
XMPMetadata.process(Metadata meta,
String mimetype)
Converts the Metadata information to XMP.
|
Constructor and Description |
---|
XMPMetadata(Metadata meta) |
XMPMetadata(Metadata meta,
String mimetype)
Initializes the data by converting the Metadata information to XMP.
|
Modifier and Type | Method and Description |
---|---|
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata) |
static com.adobe.xmp.XMPMeta |
TikaToXMP.convert(Metadata tikaMetadata,
String mimetype)
Convert the given Tika metadata map to XMP object.
|
com.adobe.xmp.XMPMeta |
RTFConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
OpenDocumentConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
MSOfficeXMLConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
MSOfficeBinaryConverter.process(Metadata metadata) |
com.adobe.xmp.XMPMeta |
ITikaToXMPConverter.process(Metadata metadata)
Converts a Tika
Metadata -object into an XMPMeta containing the useful
properties. |
com.adobe.xmp.XMPMeta |
GenericConverter.process(Metadata metadata) |
abstract com.adobe.xmp.XMPMeta |
AbstractConverter.process(Metadata metadata) |
void |
AbstractConverter.setMetadata(Metadata metadata) |
Copyright © 2007-2014 The Apache Software Foundation. All Rights Reserved.