java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFState

public class RTFState extends Object
Shared RTF parsing state: group stack, font table, codepage tracking, and unicode skip handling.

Both the HTML decapsulator and the full RTF parser use this class to manage the stateful parts of RTF processing.

Typical usage: feed every token to processToken(RTFToken) and query the current charset via getCurrentCharset().

  • Constructor Details

    • RTFState

      public RTFState()
  • Method Details

    • processToken

      public boolean processToken(RTFToken tok)
      Process a single token to update internal state.

      This handles: group open/close, charset selectors (ansi, ansicpg, deff), font table parsing (fonttbl, f, fcharset), unicode skip tracking (uc), and font changes (f in body).

      Returns:
      true if the token was consumed by state management (caller should skip it), false if the caller should also process it
    • pushGroup

      public void pushGroup()
      Open a new group: push current state and create a child.
    • popGroup

      public void popGroup()
      Close the current group: pop and restore the parent state.
    • getCurrentCharset

      public Charset getCurrentCharset()
      Returns the charset that should be used to decode the current hex escape or text byte. Priority:
      1. Font-specific charset (from \fN → \fcharsetN)
      2. Global default font's charset (from \deffN)
      3. Global charset (from \ansicpgN or family selector)
    • getGlobalCharset

      public Charset getGlobalCharset()
      Returns the global charset (\ansicpgN).
    • getCurrentGroup

      public RTFGroupState getCurrentGroup()
      Returns the current group state.
    • isInHeader

      public boolean isInHeader()
      Returns true if we're still in the RTF header (before body content).
    • getDepth

      public int getDepth()
      Returns the current group nesting depth.
    • getFontToCharset

      public Map<Integer,Charset> getFontToCharset()
      Returns the font-to-charset mapping table.
    • getAnsiSkip

      public int getAnsiSkip()
      Returns the number of ANSI chars remaining to skip.
    • getLastClosedGroup

      public RTFGroupState getLastClosedGroup()
      Returns the group state that was just closed on the most recent GROUP_CLOSE. This is the child group's state before it was popped. Useful for checking flags like objdata, pictDepth, sn, sv, sp, object to trigger completion handlers.