Class RecursiveParserWrapper

  • All Implemented Interfaces:
    Serializable, Parser

    public class RecursiveParserWrapper
    extends ParserDecorator
    This is a helper class that wraps a parser in a recursive handler. It takes care of setting the embedded parser in the ParseContext and handling the embedded path calculations.

    After parsing a document, call getMetadata() to retrieve a list of Metadata objects, one for each embedded resource. The first item in the list will contain the Metadata for the outer container file.

    Content can also be extracted and stored in the TIKA_CONTENT field of a Metadata object. Select the type of content to be stored at initialization.

    If a WriteLimitReachedException is encountered, the wrapper will stop processing the current resource, and it will not process any of the child resources for the given resource. However, it will try to parse as much as it can. If a WLRE is reached in the parent document, no child resources will be parsed.

    The implementation is based on Jukka's RecursiveMetadataParser and Nick's additions. See: RecursiveMetadataParser.

    Note that this wrapper holds all data in memory and is not appropriate for files with content too large to be held in memory.

    Note, too, that this wrapper is not thread safe because it stores state. The client must initialize a new wrapper for each thread, and the client is responsible for calling reset() after each parse.

    The unit tests for this class are in the tika-parsers module.

    See Also:
    Serialized Form