Class RTFObjDataStreamParser
java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFObjDataStreamParser
- All Implemented Interfaces:
Closeable,AutoCloseable
Parses OLE objdata from an RTF stream inline, byte by byte.
The OLE objdata structure is:
[4 bytes version][4 bytes formatId] [4 bytes classNameLen][classNameLen bytes className] [4 bytes topicNameLen][topicNameLen bytes topicName] [4 bytes itemNameLen][itemNameLen bytes itemName] [4 bytes dataSz][dataSz bytes payload]The small header fields are parsed byte-by-byte via a state machine. Once the header is complete and
dataSz is known, the payload
bytes stream directly to a temp file -- never buffered in memory.
On onComplete(Metadata, AtomicInteger), the payload is
interpreted based on className (Package, PBrush, POIFS, etc.)
and the extracted content is returned as a TikaInputStream whose
close will clean up all temp files via TemporaryResources.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()voidonByte(int b) Receive a single decoded byte from the objdata hex stream.onComplete(Metadata metadata, AtomicInteger unknownFilenameCount) Called when the objdata group closes.
-
Constructor Details
-
RTFObjDataStreamParser
public RTFObjDataStreamParser(long maxBytes) - Parameters:
maxBytes- maximum payload bytes to accept (-1 for unlimited)
-
-
Method Details
-
onByte
Receive a single decoded byte from the objdata hex stream.- Throws:
IOExceptionTikaException
-
onComplete
public TikaInputStream onComplete(Metadata metadata, AtomicInteger unknownFilenameCount) throws IOException, TikaException Called when the objdata group closes. Populates metadata and returns a TikaInputStream with the extracted embedded content, or null if the object couldn't be parsed.The caller owns the returned TikaInputStream -- closing it will clean up all temp files via TemporaryResources.
- Throws:
IOExceptionTikaException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-