Class RTFCharsetMaps

java.lang.Object
org.apache.tika.parser.microsoft.rtf.jflex.RTFCharsetMaps

public final class RTFCharsetMaps extends Object
Shared charset maps for RTF parsing. Maps RTF \fcharsetN and \ansicpgN values to Java Charset instances.

Extracted from the original TextExtractor so both the JFlex-based parser and decapsulator can reuse them.

  • Field Details

    • WINDOWS_1252

      public static final Charset WINDOWS_1252
    • FCHARSET_MAP

      public static final Map<Integer,Charset> FCHARSET_MAP
      Maps \fcharsetN values to Java charsets. The RTF font table uses these to declare per-font character encodings.
    • ANSICPG_MAP

      public static final Map<Integer,Charset> ANSICPG_MAP
      Maps \ansicpgN values to Java charsets. This is the global ANSI code page declared in the RTF header.
  • Method Details

    • resolveCodePage

      public static Charset resolveCodePage(int cpNumber)
      Resolve an ANSI code page number to a Java Charset. Tries the ANSICPG_MAP first, then falls back to windows-N and cpN. Returns WINDOWS_1252 if nothing matches.