Package org.apache.tika.embedder
Interface Embedder
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
ExternalEmbedder
Tika embedder interface
- Since:
- Apache Tika 1.3
-
Method Summary
Modifier and TypeMethodDescriptionvoid
embed
(Metadata metadata, InputStream originalStream, OutputStream outputStream, ParseContext context) Embeds related document metadata from the given metadata object into the given output stream.getSupportedEmbedTypes
(ParseContext context) Returns the set of media types supported by this embedder when used with the given parse context.
-
Method Details
-
getSupportedEmbedTypes
Returns the set of media types supported by this embedder when used with the given parse context.The name differs from the precedence of
Parser.getSupportedTypes(ParseContext)
so that parser implementations may also choose to implement this interface.- Parameters:
context
- parse context- Returns:
- immutable set of media types
-
embed
void embed(Metadata metadata, InputStream originalStream, OutputStream outputStream, ParseContext context) throws IOException, TikaException Embeds related document metadata from the given metadata object into the given output stream.The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
In general implementations should favor preserving the source file's metadata unless an update to a field is explicitly defined in the Metadata object. More specifically:
- Embedder implementations should only attempt to update metadata fields present in the given Metadata object. Other fields should be left untouched.
- Embedder implementations should set properties as empty when the corresponding field in the Metadata object is an empty string, i.e. ""
- Embedder implementations should nullify or delete properties corresponding to fields with a null value in the given Metadata object.
- Embedder implementations should set the property corresponding to a particular field in the given Metadata object in all metadata containers whenever possible and appropriate for the file format at the time. If a particular metadata container falls out of use and/or is superseded by another (such as IIC vs XMP for IPTC) it is up to the implementation to decide if and when to cease embedding in the alternate container.
- Embedder implementations should attempt to embed as much of the metadata as accurately as possible. An implementation may choose a strict approach and throw an exception if a value to be embedded exceeds the length allowed or may choose to truncate the value.
- Parameters:
metadata
- document metadata (input and output)originalStream
- the document stream (input)outputStream
- the output stream to write the metadata embedded data tocontext
- parse context- Throws:
IOException
- if the document stream could not be readTikaException
- if the document could not be parsed
-