OfficeParser (Apache Tika 1.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.tika.parser.microsoft
Class OfficeParser

java.lang.Object
  org.apache.tika.parser.AbstractParser
      org.apache.tika.parser.microsoft.OfficeParser

All Implemented Interfaces:: Serializable, Parser

public class OfficeParser
extends AbstractParser
extends AbstractParser

Defines a Microsoft document content extractor.

See Also:: Serialized Form

Nested Class Summary
`static class`	`OfficeParser.POIFSDocumentType`

Constructor Summary
`OfficeParser()`

Method Summary
`Set<MediaType>`	`getSupportedTypes(ParseContext context)` Returns the set of media types supported by this parser when used with the given parse context.
`protected void`	`parse(org.apache.poi.poifs.filesystem.DirectoryNode root, ParseContext context, Metadata metadata, XHTMLContentHandler xhtml)`
`void`	`parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)` Extracts properties and text from an MS Document input stream

Methods inherited from class org.apache.tika.parser.AbstractParser
`parse`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

OfficeParser

public OfficeParser()

Method Detail

getSupportedTypes

public Set<MediaType> getSupportedTypes(ParseContext context)

Description copied from interface: Parser

Returns the set of media types supported by this parser when used with the given parse context.

Parameters:: context - parse context
Returns:: immutable set of media types

parse

public void parse(InputStream stream,
                  ContentHandler handler,
                  Metadata metadata,
                  ParseContext context)
           throws IOException,
                  SAXException,
                  TikaException

Extracts properties and text from an MS Document input stream

Parameters:: stream - the document stream (input); handler - handler for the XHTML SAX events (output); metadata - document metadata (input and output); context - parse context
Throws:: IOException - if the document stream could not be read; SAXException - if the SAX events could not be processed; TikaException - if the document could not be parsed

parse

protected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root,
                     ParseContext context,
                     Metadata metadata,
                     XHTMLContentHandler xhtml)
              throws IOException,
                     SAXException,
                     TikaException

Throws:: IOException; SAXException; TikaException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.tika.parser.microsoft Class OfficeParser

OfficeParser

getSupportedTypes

parse

parse

org.apache.tika.parser.microsoft
Class OfficeParser