Package org.apache.tika.parser.microsoft
Class OfficeParser
- java.lang.Object
- 
- org.apache.tika.parser.AbstractParser
- 
- org.apache.tika.parser.microsoft.AbstractOfficeParser
- 
- org.apache.tika.parser.microsoft.OfficeParser
 
 
 
- 
- All Implemented Interfaces:
- Serializable,- Parser
 
 public class OfficeParser extends AbstractOfficeParser Defines a Microsoft document content extractor.- See Also:
- Serialized Form
 
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classOfficeParser.POIFSDocumentType
 - 
Constructor SummaryConstructors Constructor Description OfficeParser()
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static voidextractMacros(org.apache.poi.poifs.filesystem.POIFSFileSystem fs, ContentHandler xhtml, EmbeddedDocumentExtractor embeddedDocumentExtractor)Helper to extract macros from an NPOIFS/vbaProject.binSet<MediaType>getSupportedTypes(ParseContext context)Returns the set of media types supported by this parser when used with the given parse context.static org.apache.poi.poifs.filesystem.EntrygetUCEntry(org.apache.poi.poifs.filesystem.DirectoryEntry root, String ucTarget)Looks for entry within root (non-recursive) that has an upper-cased name that equals ucTargetvoidparse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)Extracts properties and text from an MS Document input streamprotected voidparse(org.apache.poi.poifs.filesystem.DirectoryNode root, ParseContext context, Metadata metadata, XHTMLContentHandler xhtml)- 
Methods inherited from class org.apache.tika.parser.microsoft.AbstractOfficeParserconfigure, getByteArrayMaxOverride, getDateFormatOverride, isConcatenatePhoneticRuns, isExtractAllAlternativesFromMSG, isExtractMacros, isIncludeDeletedContent, isIncludeHeadersAndFooters, isIncludeMoveFromContent, isIncludeShapeBasedContent, isUseSAXDocxExtractor, isUseSAXPptxExtractor, setByteArrayMaxOverride, setConcatenatePhoneticRuns, setDateFormatOverride, setExtractAllAlternativesFromMSG, setExtractMacros, setIncludeDeletedContent, setIncludeHeadersAndFooters, setIncludeMoveFromContent, setIncludeShapeBasedContent, setUseSAXDocxExtractor, setUseSAXPptxExtractor
 - 
Methods inherited from class org.apache.tika.parser.AbstractParserparse
 
- 
 
- 
- 
- 
Method Detail- 
extractMacrospublic static void extractMacros(org.apache.poi.poifs.filesystem.POIFSFileSystem fs, ContentHandler xhtml, EmbeddedDocumentExtractor embeddedDocumentExtractor) throws IOException, SAXExceptionHelper to extract macros from an NPOIFS/vbaProject.binAs of POI-3.15-final, there are still some bugs in VBAMacroReader. For now, we are swallowing NPE and other runtime exceptions - Parameters:
- fs- NPOIFS to extract from
- xhtml- SAX writer
- embeddedDocumentExtractor- extractor for embedded documents
- Throws:
- IOException- on IOException if it occurs during the extraction of the embedded doc
- SAXException- on SAXException for writing to xhtml
 
 - 
getSupportedTypespublic Set<MediaType> getSupportedTypes(ParseContext context) Description copied from interface:ParserReturns the set of media types supported by this parser when used with the given parse context.- Parameters:
- context- parse context
- Returns:
- immutable set of media types
 
 - 
parsepublic void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Extracts properties and text from an MS Document input stream- Parameters:
- stream- the document stream (input)
- handler- handler for the XHTML SAX events (output)
- metadata- document metadata (input and output)
- context- parse context
- Throws:
- IOException- if the document stream could not be read
- SAXException- if the SAX events could not be processed
- TikaException- if the document could not be parsed
 
 - 
parseprotected void parse(org.apache.poi.poifs.filesystem.DirectoryNode root, ParseContext context, Metadata metadata, XHTMLContentHandler xhtml) throws IOException, SAXException, TikaException- Throws:
- IOException
- SAXException
- TikaException
 
 - 
getUCEntrypublic static org.apache.poi.poifs.filesystem.Entry getUCEntry(org.apache.poi.poifs.filesystem.DirectoryEntry root, String ucTarget)Looks for entry within root (non-recursive) that has an upper-cased name that equals ucTarget- Parameters:
- root-
- ucTarget-
- Returns:
 
 
- 
 
-