Class XMLReaderUtils

    • Field Detail

      • DEFAULT_POOL_SIZE

        public static final int DEFAULT_POOL_SIZE
        Default size for the pool of SAX Parsers and the pool of DOM builders
        See Also:
        Constant Field Values
      • DEFAULT_MAX_ENTITY_EXPANSIONS

        public static final int DEFAULT_MAX_ENTITY_EXPANSIONS
        See Also:
        Constant Field Values
    • Constructor Detail

      • XMLReaderUtils

        public XMLReaderUtils()
    • Method Detail

      • getXMLReader

        public static XMLReader getXMLReader()
                                      throws TikaException
        Returns the XMLReader specified in this parsing context. If a reader is not explicitly specified, then one is created using the specified or the default SAX parser.
        Returns:
        XMLReader
        Throws:
        TikaException
        Since:
        Apache Tika 1.13
        See Also:
        getSAXParser()
      • getSAXParser

        public static SAXParser getSAXParser()
                                      throws TikaException
        Returns the SAX parser specified in this parsing context. If a parser is not explicitly specified, then one is created using the specified or the default SAX parser factory.

        If you call reset() on the parser, make sure to replace the SecurityManager which will be cleared by xerces2 on reset().

        Returns:
        SAX parser
        Throws:
        TikaException - if a SAX parser could not be created
        Since:
        Apache Tika 0.8
        See Also:
        getSAXParserFactory()
      • getSAXParserFactory

        public static SAXParserFactory getSAXParserFactory()
        Returns the SAX parser factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware, not validating, and to use secure XML processing.
        Returns:
        SAX parser factory
        Since:
        Apache Tika 0.8
      • getDocumentBuilderFactory

        public static DocumentBuilderFactory getDocumentBuilderFactory()
        Returns the DOM builder factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware and to apply reasonable security features.
        Returns:
        DOM parser factory
        Since:
        Apache Tika 1.13
      • getDocumentBuilder

        public static DocumentBuilder getDocumentBuilder()
                                                  throws TikaException
        Returns the DOM builder specified in this parsing context. If a builder is not explicitly specified, then a builder instance is created and returned. The builder instance is configured to apply an IGNORING_SAX_ENTITY_RESOLVER, and it sets the ErrorHandler to null.
        Returns:
        DOM Builder
        Throws:
        TikaException
        Since:
        Apache Tika 1.13
      • getXMLInputFactory

        public static XMLInputFactory getXMLInputFactory()
        Returns the StAX input factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware and to apply reasonable security using the IGNORING_STAX_ENTITY_RESOLVER.
        Returns:
        StAX input factory
        Since:
        Apache Tika 1.13
      • getPoolSize

        public static int getPoolSize()
      • setPoolSize

        public static void setPoolSize​(int poolSize)
                                throws TikaException
        Set the pool size for cached XML parsers. This has a side effect of locking the pool, and rebuilding the pool from scratch with the most recent settings, such as MAX_ENTITY_EXPANSIONS
        Parameters:
        poolSize -
        Throws:
        TikaException
        Since:
        Apache Tika 1.19
      • getMaxEntityExpansions

        public static int getMaxEntityExpansions()
      • setMaxEntityExpansions

        public static void setMaxEntityExpansions​(int maxEntityExpansions)
        Set the maximum number of entity expansions allowable in SAX/DOM/StAX parsing. NOTE:A value less than or equal to zero indicates no limit. This will override the system property JAXP_ENTITY_EXPANSION_LIMIT_KEY and the DEFAULT_MAX_ENTITY_EXPANSIONS value for allowable entity expansions

        NOTE: To trigger a rebuild of the pool of parsers with this setting, the client must call setPoolSize(int) to rebuild the SAX and DOM parsers with this setting.

        Parameters:
        maxEntityExpansions - -- maximum number of allowable entity expansions
        Since:
        Apache Tika 1.19
      • getAttrValue

        public static String getAttrValue​(String localName,
                                          Attributes atts)
        Parameters:
        localName -
        atts -
        Returns:
        attribute value with that local name or null if not found
      • getDocumentBuilder

        public static DocumentBuilder getDocumentBuilder​(ParseContext context)
                                                  throws TikaException
        Returns the DOM builder specified in this parsing context. If a builder is not explicitly specified, then a builder instance is created and returned. The builder instance is configured to apply an IGNORING_SAX_ENTITY_RESOLVER, and it sets the ErrorHandler to null. Consider using buildDOM(InputStream, ParseContext) instead for more efficient reuse of document builders.
        Returns:
        DOM Builder
        Throws:
        TikaException
      • getXMLInputFactory

        public static XMLInputFactory getXMLInputFactory​(ParseContext context)
        Returns the StAX input factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware and to apply reasonable security using the IGNORING_STAX_ENTITY_RESOLVER.
        Returns:
        StAX input factory
      • getTransformer

        public static Transformer getTransformer​(ParseContext context)
                                          throws TikaException
        Returns the transformer specified in this parsing context.

        If a transformer is not explicitly specified, then a default transformer instance is created and returned. The default transformer instance is configured to to use secure XML processing.

        Returns:
        Transformer
        Throws:
        TikaException - when the transformer can not be created