Class RTFObjDataStreamParser

java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFObjDataStreamParser
All Implemented Interfaces:
Closeable, AutoCloseable

public class RTFObjDataStreamParser extends Object implements Closeable
Parses OLE objdata from an RTF stream inline, byte by byte.

The OLE objdata structure is:

   [4 bytes version][4 bytes formatId]
   [4 bytes classNameLen][classNameLen bytes className]
   [4 bytes topicNameLen][topicNameLen bytes topicName]
   [4 bytes itemNameLen][itemNameLen bytes itemName]
   [4 bytes dataSz][dataSz bytes payload]
 
The small header fields are parsed byte-by-byte via a state machine. Once the header is complete and dataSz is known, the payload bytes stream directly to a temp file -- never buffered in memory.

On onComplete(Metadata, AtomicInteger), the payload is interpreted based on className (Package, PBrush, POIFS, etc.) and the extracted content is returned as a TikaInputStream whose close will clean up all temp files via TemporaryResources.

  • Constructor Details

    • RTFObjDataStreamParser

      public RTFObjDataStreamParser(long maxBytes)
      Parameters:
      maxBytes - maximum payload bytes to accept (-1 for unlimited)
  • Method Details