Class WordMLParser
java.lang.Object
org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
org.apache.tika.parser.microsoft.xml.WordMLParser
- All Implemented Interfaces:
Serializable,Parser
Parses wordml 2003 format word files. These are single xml files
that predate ooxml.
See https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected ContentHandlergetContentHandler(ContentHandler ch, Metadata metadata, ParseContext context) getSupportedTypes(ParseContext context) Returns the set of media types supported by this parser when used with the given parse context.voidsetContentType(Metadata metadata) Methods inherited from class org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
parse
-
Constructor Details
-
WordMLParser
public WordMLParser()
-
-
Method Details
-
getSupportedTypes
Description copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Parameters:
context- parse context- Returns:
- immutable set of media types
-
getContentHandler
protected ContentHandler getContentHandler(ContentHandler ch, Metadata metadata, ParseContext context) - Overrides:
getContentHandlerin classAbstractXML2003Parser
-
setContentType
- Specified by:
setContentTypein classAbstractXML2003Parser
-