Interface PDF


public interface PDF
PDF properties collection.
Since:
Apache Tika 1.14
  • Field Details

    • PDF_PREFIX

      static final String PDF_PREFIX
      See Also:
    • PDFA_PREFIX

      static final String PDFA_PREFIX
      See Also:
    • PDFAID_PREFIX

      static final String PDFAID_PREFIX
      See Also:
    • EOF_OFFSETS

      static final Property EOF_OFFSETS
      Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF.
    • PDF_DOC_INFO_PREFIX

      static final String PDF_DOC_INFO_PREFIX
      Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)
      See Also:
    • PDF_DOC_INFO_CUSTOM_PREFIX

      static final String PDF_DOC_INFO_CUSTOM_PREFIX
      See Also:
    • DOC_INFO_CREATED

      static final Property DOC_INFO_CREATED
    • DOC_INFO_CREATOR

      static final Property DOC_INFO_CREATOR
    • DOC_INFO_CREATOR_TOOL

      static final Property DOC_INFO_CREATOR_TOOL
    • DOC_INFO_MODIFICATION_DATE

      static final Property DOC_INFO_MODIFICATION_DATE
    • DOC_INFO_KEY_WORDS

      static final Property DOC_INFO_KEY_WORDS
    • DOC_INFO_PRODUCER

      static final Property DOC_INFO_PRODUCER
    • DOC_INFO_SUBJECT

      static final Property DOC_INFO_SUBJECT
    • DOC_INFO_TITLE

      static final Property DOC_INFO_TITLE
    • DOC_INFO_TRAPPED

      static final Property DOC_INFO_TRAPPED
    • PDF_VERSION

      static final Property PDF_VERSION
    • PDFA_VERSION

      static final Property PDFA_VERSION
    • PDF_EXTENSION_VERSION

      static final Property PDF_EXTENSION_VERSION
    • PDFAID_CONFORMANCE

      static final Property PDFAID_CONFORMANCE
    • PDFAID_PART

      static final Property PDFAID_PART
    • PDFUAID_PART

      static final Property PDFUAID_PART
    • PDFVT_VERSION

      static final Property PDFVT_VERSION
    • PDFVT_MODIFIED

      static final Property PDFVT_MODIFIED
    • PDFXID_VERSION

      static final Property PDFXID_VERSION
    • PDFX_VERSION

      static final Property PDFX_VERSION
    • PDFX_CONFORMANCE

      static final Property PDFX_CONFORMANCE
    • ILLUSTRATOR_TYPE

      static final Property ILLUSTRATOR_TYPE
    • IS_ENCRYPTED

      static final Property IS_ENCRYPTED
    • PRODUCER

      static final Property PRODUCER
    • ACTION_TRIGGER

      static final Property ACTION_TRIGGER
      This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF.
    • ACTION_TRIGGERS

      static final Property ACTION_TRIGGERS
      This is a list of all action or destination triggers contained within a given PDF.
    • ACTION_TYPES

      static final Property ACTION_TYPES
    • CHARACTERS_PER_PAGE

      static final Property CHARACTERS_PER_PAGE
    • UNMAPPED_UNICODE_CHARS_PER_PAGE

      static final Property UNMAPPED_UNICODE_CHARS_PER_PAGE
    • TOTAL_UNMAPPED_UNICODE_CHARS

      static final Property TOTAL_UNMAPPED_UNICODE_CHARS
    • OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS

      static final Property OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
    • CONTAINS_DAMAGED_FONT

      static final Property CONTAINS_DAMAGED_FONT
      Contains at least one damaged font for at least one character
    • CONTAINS_NON_EMBEDDED_FONT

      static final Property CONTAINS_NON_EMBEDDED_FONT
      Contains at least one font that is not embedded
    • HAS_XFA

      static final Property HAS_XFA
      Has XFA
    • HAS_XMP

      static final Property HAS_XMP
      Has XMP, whether or not it is valid
    • XMP_LOCATION

      static final Property XMP_LOCATION
      If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or?
    • HAS_ACROFORM_FIELDS

      static final Property HAS_ACROFORM_FIELDS
      Has > 0 AcroForm fields
    • HAS_MARKED_CONTENT

      static final Property HAS_MARKED_CONTENT
    • HAS_COLLECTION

      static final Property HAS_COLLECTION
      Has a collection element in the root. If true, this is likely a PDF Portfolio.
    • EMBEDDED_FILE_DESCRIPTION

      static final Property EMBEDDED_FILE_DESCRIPTION
    • EMBEDDED_FILE_ANNOTATION_TYPE

      static final Property EMBEDDED_FILE_ANNOTATION_TYPE
      If the file came from an annotation and there was a type
    • EMBEDDED_FILE_SUBTYPE

      static final Property EMBEDDED_FILE_SUBTYPE
      literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type
    • HAS_3D

      static final Property HAS_3D
      If the PDF has an annotation of type 3D
    • ANNOTATION_TYPES

      static final Property ANNOTATION_TYPES
    • ANNOTATION_SUBTYPES

      static final Property ANNOTATION_SUBTYPES
    • NUM_3D_ANNOTATIONS

      static final Property NUM_3D_ANNOTATIONS
      Number of 3D annotations a PDF contains. This makes HAS_3D redundant.
    • ASSOCIATED_FILE_RELATIONSHIP

      static final Property ASSOCIATED_FILE_RELATIONSHIP
    • INCREMENTAL_UPDATE_NUMBER

      static final Property INCREMENTAL_UPDATE_NUMBER
      This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser.
    • PDF_INCREMENTAL_UPDATE_COUNT

      static final Property PDF_INCREMENTAL_UPDATE_COUNT
      Incremental updates as extracted by the StartXRefScanner. See that class for limitations.
    • OCR_PAGE_COUNT

      static final Property OCR_PAGE_COUNT
      This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will