Class FieldCodeParser

java.lang.Object
org.apache.tika.parser.microsoft.ooxml.FieldCodeParser

public class FieldCodeParser extends Object
Parses OOXML field codes (instrText) to extract URLs from HYPERLINK, INCLUDEPICTURE, INCLUDETEXT, IMPORT, and LINK fields.

This class has no Tika dependencies and could be contributed to POI.

  • Method Details

    • parseHyperlinkFromInstrText

      public static String parseHyperlinkFromInstrText(String instrText)
      Parses a HYPERLINK URL from instrText field code content. Field codes like: HYPERLINK "https://example.com"
      Parameters:
      instrText - the accumulated instrText content
      Returns:
      the URL if found, or null
    • parseExternalRefFromInstrText

      public static String parseExternalRefFromInstrText(String instrText, StringBuilder fieldType)
      Parses URLs from instrText field codes that reference external resources. This includes INCLUDEPICTURE, INCLUDETEXT, IMPORT, and LINK fields.
      Parameters:
      instrText - the accumulated instrText content
      fieldType - output parameter - will contain the field type if found
      Returns:
      the URL if found, or null