Interface PDF


  • public interface PDF
    PDF properties collection.
    Since:
    Apache Tika 1.14
    • Field Detail

      • EOF_OFFSETS

        static final Property EOF_OFFSETS
        Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF.
      • PDF_DOC_INFO_PREFIX

        static final String PDF_DOC_INFO_PREFIX
        Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)
        See Also:
        Constant Field Values
      • DOC_INFO_CREATED

        static final Property DOC_INFO_CREATED
      • DOC_INFO_CREATOR

        static final Property DOC_INFO_CREATOR
      • DOC_INFO_CREATOR_TOOL

        static final Property DOC_INFO_CREATOR_TOOL
      • DOC_INFO_MODIFICATION_DATE

        static final Property DOC_INFO_MODIFICATION_DATE
      • DOC_INFO_KEY_WORDS

        static final Property DOC_INFO_KEY_WORDS
      • DOC_INFO_PRODUCER

        static final Property DOC_INFO_PRODUCER
      • DOC_INFO_SUBJECT

        static final Property DOC_INFO_SUBJECT
      • DOC_INFO_TITLE

        static final Property DOC_INFO_TITLE
      • DOC_INFO_TRAPPED

        static final Property DOC_INFO_TRAPPED
      • PDF_VERSION

        static final Property PDF_VERSION
      • PDFA_VERSION

        static final Property PDFA_VERSION
      • PDF_EXTENSION_VERSION

        static final Property PDF_EXTENSION_VERSION
      • PDFAID_CONFORMANCE

        static final Property PDFAID_CONFORMANCE
      • PDFAID_PART

        static final Property PDFAID_PART
      • PDFUAID_PART

        static final Property PDFUAID_PART
      • PDFVT_VERSION

        static final Property PDFVT_VERSION
      • PDFVT_MODIFIED

        static final Property PDFVT_MODIFIED
      • PDFXID_VERSION

        static final Property PDFXID_VERSION
      • PDFX_VERSION

        static final Property PDFX_VERSION
      • PDFX_CONFORMANCE

        static final Property PDFX_CONFORMANCE
      • ILLUSTRATOR_TYPE

        static final Property ILLUSTRATOR_TYPE
      • IS_ENCRYPTED

        static final Property IS_ENCRYPTED
      • PRODUCER

        static final Property PRODUCER
      • ACTION_TRIGGER

        static final Property ACTION_TRIGGER
        This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF.
      • ACTION_TRIGGERS

        static final Property ACTION_TRIGGERS
        This is a list of all action or destination triggers contained within a given PDF.
      • ACTION_TYPES

        static final Property ACTION_TYPES
      • CHARACTERS_PER_PAGE

        static final Property CHARACTERS_PER_PAGE
      • UNMAPPED_UNICODE_CHARS_PER_PAGE

        static final Property UNMAPPED_UNICODE_CHARS_PER_PAGE
      • TOTAL_UNMAPPED_UNICODE_CHARS

        static final Property TOTAL_UNMAPPED_UNICODE_CHARS
      • OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS

        static final Property OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
      • CONTAINS_DAMAGED_FONT

        static final Property CONTAINS_DAMAGED_FONT
        Contains at least one damaged font for at least one character
      • CONTAINS_NON_EMBEDDED_FONT

        static final Property CONTAINS_NON_EMBEDDED_FONT
        Contains at least one font that is not embedded
      • HAS_XFA

        static final Property HAS_XFA
        Has XFA
      • HAS_XMP

        static final Property HAS_XMP
        Has XMP, whether or not it is valid
      • XMP_LOCATION

        static final Property XMP_LOCATION
        If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or?
      • HAS_ACROFORM_FIELDS

        static final Property HAS_ACROFORM_FIELDS
        Has > 0 AcroForm fields
      • HAS_MARKED_CONTENT

        static final Property HAS_MARKED_CONTENT
      • HAS_COLLECTION

        static final Property HAS_COLLECTION
        Has a collection element in the root. If true, this is likely a PDF Portfolio.
      • EMBEDDED_FILE_DESCRIPTION

        static final Property EMBEDDED_FILE_DESCRIPTION
      • EMBEDDED_FILE_ANNOTATION_TYPE

        static final Property EMBEDDED_FILE_ANNOTATION_TYPE
        If the file came from an annotation and there was a type
      • EMBEDDED_FILE_SUBTYPE

        static final Property EMBEDDED_FILE_SUBTYPE
        literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type
      • HAS_3D

        static final Property HAS_3D
        If the PDF has an annotation of type 3D
      • ANNOTATION_TYPES

        static final Property ANNOTATION_TYPES
      • ANNOTATION_SUBTYPES

        static final Property ANNOTATION_SUBTYPES
      • NUM_3D_ANNOTATIONS

        static final Property NUM_3D_ANNOTATIONS
        Number of 3D annotations a PDF contains. This makes HAS_3D redundant.
      • ASSOCIATED_FILE_RELATIONSHIP

        static final Property ASSOCIATED_FILE_RELATIONSHIP
      • INCREMENTAL_UPDATE_NUMBER

        static final Property INCREMENTAL_UPDATE_NUMBER
        This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser.
      • PDF_INCREMENTAL_UPDATE_COUNT

        static final Property PDF_INCREMENTAL_UPDATE_COUNT
        Incremental updates as extracted by the StartXRefScanner. See that class for limitations.
      • OCR_PAGE_COUNT

        static final Property OCR_PAGE_COUNT
        This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will