Interface Embedder

All Superinterfaces:
Serializable
All Known Implementing Classes:
ExternalEmbedder

public interface Embedder extends Serializable
Tika embedder interface
Since:
Apache Tika 1.3
  • Method Details

    • getSupportedEmbedTypes

      Set<MediaType> getSupportedEmbedTypes(ParseContext context)
      Returns the set of media types supported by this embedder when used with the given parse context.

      The name differs from the precedence of Parser.getSupportedTypes(ParseContext) so that parser implementations may also choose to implement this interface.

      Parameters:
      context - parse context
      Returns:
      immutable set of media types
    • embed

      void embed(Metadata metadata, InputStream originalStream, OutputStream outputStream, ParseContext context) throws IOException, TikaException
      Embeds related document metadata from the given metadata object into the given output stream.

      The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.

      Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.

      In general implementations should favor preserving the source file's metadata unless an update to a field is explicitly defined in the Metadata object. More specifically:

      • Embedder implementations should only attempt to update metadata fields present in the given Metadata object. Other fields should be left untouched.
      • Embedder implementations should set properties as empty when the corresponding field in the Metadata object is an empty string, i.e. ""
      • Embedder implementations should nullify or delete properties corresponding to fields with a null value in the given Metadata object.
      • Embedder implementations should set the property corresponding to a particular field in the given Metadata object in all metadata containers whenever possible and appropriate for the file format at the time. If a particular metadata container falls out of use and/or is superseded by another (such as IIC vs XMP for IPTC) it is up to the implementation to decide if and when to cease embedding in the alternate container.
      • Embedder implementations should attempt to embed as much of the metadata as accurately as possible. An implementation may choose a strict approach and throw an exception if a value to be embedded exceeds the length allowed or may choose to truncate the value.
      Parameters:
      metadata - document metadata (input and output)
      originalStream - the document stream (input)
      outputStream - the output stream to write the metadata embedded data to
      context - parse context
      Throws:
      IOException - if the document stream could not be read
      TikaException - if the document could not be parsed