org.apache.tika.parser.microsoft
Class OfficeParser
java.lang.Object
org.apache.tika.parser.AbstractParser
org.apache.tika.parser.microsoft.OfficeParser
- All Implemented Interfaces:
- Serializable, Parser
public class OfficeParser
- extends AbstractParser
Defines a Microsoft document content extractor.
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
OfficeParser
public OfficeParser()
getSupportedTypes
public Set<MediaType> getSupportedTypes(ParseContext context)
- Description copied from interface:
Parser
- Returns the set of media types supported by this parser when used
with the given parse context.
- Parameters:
context
- parse context
- Returns:
- immutable set of media types
parse
public void parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
throws IOException,
SAXException,
TikaException
- Extracts properties and text from an MS Document input stream
- Parameters:
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context
- Throws:
IOException
- if the document stream could not be read
SAXException
- if the SAX events could not be processed
TikaException
- if the document could not be parsed
parse
protected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
ParseContext context,
Metadata metadata,
XHTMLContentHandler xhtml)
throws IOException,
SAXException,
TikaException
- Throws:
IOException
SAXException
TikaException
Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.