Class AbstractRecursiveParserWrapperHandler

    • Field Detail

      • EMBEDDED_RESOURCE_LIMIT_REACHED

        public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
    • Constructor Detail

      • AbstractRecursiveParserWrapperHandler

        public AbstractRecursiveParserWrapperHandler​(ContentHandlerFactory contentHandlerFactory)
      • AbstractRecursiveParserWrapperHandler

        public AbstractRecursiveParserWrapperHandler​(ContentHandlerFactory contentHandlerFactory,
                                                     int maxEmbeddedResources)
    • Method Detail

      • startEmbeddedDocument

        public void startEmbeddedDocument​(ContentHandler contentHandler,
                                          Metadata metadata)
                                   throws SAXException
        This is called before parsing each embedded document. Override this for custom behavior. Make sure to call this in your custom classes because this tracks the number of embedded documents.
        Parameters:
        contentHandler - local handler to be used on this embedded document
        metadata - embedded document's metadata
        Throws:
        SAXException
      • endEmbeddedDocument

        public void endEmbeddedDocument​(ContentHandler contentHandler,
                                        Metadata metadata)
                                 throws SAXException
        This is called after parsing each embedded document. Override this for custom behavior. This is currently a no-op.
        Parameters:
        contentHandler - content handler that was used on this embedded document
        metadata - metadata for this embedded document
        Throws:
        SAXException
      • endDocument

        public void endDocument​(ContentHandler contentHandler,
                                Metadata metadata)
                         throws SAXException
        This is called after the full parse has completed. Override this for custom behavior. Make sure to call this as super.endDocument(...) in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.
        Parameters:
        contentHandler - content handler that was used on the main document
        metadata - metadata that was gathered for the main document
        Throws:
        SAXException
      • hasHitMaximumEmbeddedResources

        public boolean hasHitMaximumEmbeddedResources()
        Returns:
        whether this handler has hit the maximum embedded resources during the parse