Class RTFState
java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFState
Shared RTF parsing state: group stack, font table, codepage tracking,
and unicode skip handling.
Both the HTML decapsulator and the full RTF parser use this class to manage the stateful parts of RTF processing.
Typical usage: feed every token to processToken(RTFToken)
and query the current charset via getCurrentCharset().
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintReturns the number of ANSI chars remaining to skip.Returns the charset that should be used to decode the current hex escape or text byte.Returns the current group state.intgetDepth()Returns the current group nesting depth.Returns the font-to-charset mapping table.Returns the global charset (\ansicpgN).Returns the group state that was just closed on the most recent GROUP_CLOSE.booleanReturns true if we're still in the RTF header (before body content).voidpopGroup()Close the current group: pop and restore the parent state.booleanprocessToken(RTFToken tok) Process a single token to update internal state.voidOpen a new group: push current state and create a child.
-
Constructor Details
-
RTFState
public RTFState()
-
-
Method Details
-
processToken
Process a single token to update internal state.This handles: group open/close, charset selectors (ansi, ansicpg, deff), font table parsing (fonttbl, f, fcharset), unicode skip tracking (uc), and font changes (f in body).
- Returns:
- true if the token was consumed by state management (caller should skip it), false if the caller should also process it
-
pushGroup
public void pushGroup()Open a new group: push current state and create a child. -
popGroup
public void popGroup()Close the current group: pop and restore the parent state. -
getCurrentCharset
Returns the charset that should be used to decode the current hex escape or text byte. Priority:- Font-specific charset (from
\fN → \fcharsetN) - Global default font's charset (from
\deffN) - Global charset (from
\ansicpgNor family selector)
- Font-specific charset (from
-
getGlobalCharset
Returns the global charset (\ansicpgN). -
getCurrentGroup
Returns the current group state. -
isInHeader
public boolean isInHeader()Returns true if we're still in the RTF header (before body content). -
getDepth
public int getDepth()Returns the current group nesting depth. -
getFontToCharset
Returns the font-to-charset mapping table. -
getAnsiSkip
public int getAnsiSkip()Returns the number of ANSI chars remaining to skip. -
getLastClosedGroup
Returns the group state that was just closed on the most recent GROUP_CLOSE. This is the child group's state before it was popped. Useful for checking flags like objdata, pictDepth, sn, sv, sp, object to trigger completion handlers.
-