Class RTFHtmlDecapsulator
java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFHtmlDecapsulator
Extracts the original HTML from an RTF document that contains encapsulated HTML
(as indicated by the
\fromhtml1 control word), using a JFlex-based tokenizer
and shared RTFState for font/codepage tracking.
Embedded objects and pictures are extracted in the same pass via
RTFEmbeddedHandler.
-
Constructor Summary
ConstructorsConstructorDescriptionRTFHtmlDecapsulator(ContentHandler handler, ParseContext context) RTFHtmlDecapsulator(ContentHandler handler, ParseContext context, int maxBytesInKb) -
Method Summary
-
Constructor Details
-
RTFHtmlDecapsulator
-
RTFHtmlDecapsulator
-
-
Method Details
-
extract
- Throws:
IOExceptionSAXExceptionTikaException
-