Class OOXMLTikaBodyPartHandler
java.lang.Object
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- All Implemented Interfaces:
XWPFBodyContentsHandler
-
Constructor Summary
ConstructorsConstructorDescriptionOOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, Metadata metadata) OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, XWPFStylesShim styles, XWPFListManager listManager, OfficeParserConfig parserConfig) OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, XWPFStylesShim styles, XWPFListManager listManager, OfficeParserConfig parserConfig, Metadata metadata) -
Method Summary
Modifier and TypeMethodDescriptionvoidCalled when a comment reference is encountered in the document body.voidembeddedOLERef(String relId, String progId, String emfImageRId) voidembeddedPicRef(String picFileName, String picDescription) voidendBookmark(String id) voidvoidvoidvoidendSDT()voidendTable()voidvoidvoidexternalRef(String fieldType, String url) Called when an external reference URL is found in a field code.voidCalled when a hyperlink is found via a field code (instrText HYPERLINK).voidReturns the set of comment IDs that were inlined during parsing.voidvoidhyperlinkStart(String link) booleanbooleanvoidlinkedOLERef(String relId) Called when a linked (vs embedded) OLE object is found.voidrun(RunProperties runProperties, String contents) voidsetInlineBodyPartMap(org.apache.tika.parser.microsoft.ooxml.OOXMLInlineBodyPartMap inlinePartMap, ParseContext parseContext) Sets pre-parsed inline body part content (footnotes, endnotes, comments) so that references encountered during main document parsing can be resolved inline.voidstartBookmark(String id, String name) voidstartEditedSection(String editor, Date date, EditType editType) voidstartParagraph(ParagraphProperties paragraphProperties) voidstartSDT()voidvoidvoid
-
Constructor Details
-
OOXMLTikaBodyPartHandler
-
OOXMLTikaBodyPartHandler
-
OOXMLTikaBodyPartHandler
public OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, XWPFStylesShim styles, XWPFListManager listManager, OfficeParserConfig parserConfig) -
OOXMLTikaBodyPartHandler
public OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, XWPFStylesShim styles, XWPFListManager listManager, OfficeParserConfig parserConfig, Metadata metadata)
-
-
Method Details
-
setInlineBodyPartMap
public void setInlineBodyPartMap(org.apache.tika.parser.microsoft.ooxml.OOXMLInlineBodyPartMap inlinePartMap, ParseContext parseContext) Sets pre-parsed inline body part content (footnotes, endnotes, comments) so that references encountered during main document parsing can be resolved inline. -
run
- Specified by:
runin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
hyperlinkStart
- Specified by:
hyperlinkStartin interfaceXWPFBodyContentsHandler- Parameters:
link- the link; can be null- Throws:
SAXException
-
hyperlinkEnd
- Specified by:
hyperlinkEndin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
startParagraph
- Specified by:
startParagraphin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endParagraph
- Specified by:
endParagraphin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
getEmittedCommentIds
Returns the set of comment IDs that were inlined during parsing. Used by the decorator to skip these when dumping remaining comments. -
startTable
- Specified by:
startTablein interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endTable
- Specified by:
endTablein interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
startTableRow
- Specified by:
startTableRowin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endTableRow
- Specified by:
endTableRowin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
startTableCell
- Specified by:
startTableCellin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endTableCell
- Specified by:
endTableCellin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
startSDT
- Specified by:
startSDTin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endSDT
public void endSDT()- Specified by:
endSDTin interfaceXWPFBodyContentsHandler
-
startEditedSection
- Specified by:
startEditedSectionin interfaceXWPFBodyContentsHandler
-
endEditedSection
public void endEditedSection()- Specified by:
endEditedSectionin interfaceXWPFBodyContentsHandler
-
isIncludeDeletedText
public boolean isIncludeDeletedText()- Specified by:
isIncludeDeletedTextin interfaceXWPFBodyContentsHandler
-
footnoteReference
- Specified by:
footnoteReferencein interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endnoteReference
- Specified by:
endnoteReferencein interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
commentReference
Description copied from interface:XWPFBodyContentsHandlerCalled when a comment reference is encountered in the document body.- Specified by:
commentReferencein interfaceXWPFBodyContentsHandler- Parameters:
id- the comment ID- Throws:
SAXException
-
isIncludeMoveFromText
public boolean isIncludeMoveFromText()- Specified by:
isIncludeMoveFromTextin interfaceXWPFBodyContentsHandler
-
embeddedOLERef
- Specified by:
embeddedOLERefin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
getEmbeddedPartMetadataMap
-
linkedOLERef
Description copied from interface:XWPFBodyContentsHandlerCalled when a linked (vs embedded) OLE object is found. These reference external files and are a security concern.- Specified by:
linkedOLERefin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
embeddedPicRef
- Specified by:
embeddedPicRefin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
fieldCodeHyperlinkStart
Description copied from interface:XWPFBodyContentsHandlerCalled when a hyperlink is found via a field code (instrText HYPERLINK). Distinct from relationship-based hyperlinks for security tracking purposes.- Specified by:
fieldCodeHyperlinkStartin interfaceXWPFBodyContentsHandler- Parameters:
link- the link URL- Throws:
SAXException
-
externalRef
Description copied from interface:XWPFBodyContentsHandlerCalled when an external reference URL is found in a field code. This includes INCLUDEPICTURE, INCLUDETEXT, IMPORT, LINK fields, and DrawingML/VML hyperlinks on shapes.- Specified by:
externalRefin interfaceXWPFBodyContentsHandler- Parameters:
fieldType- the type of field (e.g., "INCLUDEPICTURE", "hlinkHover", "vml-href")url- the external URL- Throws:
SAXException
-
startBookmark
- Specified by:
startBookmarkin interfaceXWPFBodyContentsHandler- Throws:
SAXException
-
endBookmark
- Specified by:
endBookmarkin interfaceXWPFBodyContentsHandler
-