Class CompositeParser

    • Method Detail

      • findDuplicateParsers

        public Map<MediaType,​List<Parser>> findDuplicateParsers​(ParseContext context)
        Utility method that goes through all the component parsers and finds all media types for which more than one parser declares support. This is useful in tracking down conflicting parser definitions.
        Parameters:
        context - parsing context
        Returns:
        media types that are supported by at least two component parsers
        Since:
        Apache Tika 0.10
        See Also:
        TIKA-660
      • getMediaTypeRegistry

        public MediaTypeRegistry getMediaTypeRegistry()
        Returns the media type registry used to infer type relationships.
        Returns:
        media type registry
        Since:
        Apache Tika 0.8
      • setMediaTypeRegistry

        public void setMediaTypeRegistry​(MediaTypeRegistry registry)
        Sets the media type registry used to infer type relationships.
        Parameters:
        registry - media type registry
        Since:
        Apache Tika 0.8
      • getAllComponentParsers

        public List<Parser> getAllComponentParsers()
        Returns all parsers registered with the Composite Parser, including ones which may not currently be active. This won't include the Fallback Parser, if defined
      • getParsers

        public Map<MediaType,​Parser> getParsers()
        Returns the component parsers.
        Returns:
        component parsers, keyed by media type
      • setParsers

        public void setParsers​(Map<MediaType,​Parser> parsers)
        Sets the component parsers.
        Parameters:
        parsers - component parsers, keyed by media type
      • getFallback

        public Parser getFallback()
        Returns the fallback parser.
        Returns:
        fallback parser
      • setFallback

        public void setFallback​(Parser fallback)
        Sets the fallback parser.
        Parameters:
        fallback - fallback parser
      • getParser

        protected Parser getParser​(Metadata metadata)
        Returns the parser that best matches the given metadata. By default looks for a parser that matches the content type metadata property, and uses the fallback parser if a better match is not found. The type hierarchy information included in the configured media type registry is used when looking for a matching parser instance.

        Subclasses can override this method to provide more accurate parser resolution.

        Parameters:
        metadata - document metadata
        Returns:
        matching parser
      • getSupportedTypes

        public Set<MediaType> getSupportedTypes​(ParseContext context)
        Description copied from interface: Parser
        Returns the set of media types supported by this parser when used with the given parse context.
        Specified by:
        getSupportedTypes in interface Parser
        Parameters:
        context - parse context
        Returns:
        immutable set of media types