Package org.apache.tika.metadata
Interface PDF
public interface PDF
PDF properties collection.
- Since:
- Apache Tika 1.14
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final PropertyThis specifies where an action or destination would be found/triggered in the document: on document open, before close, etc.static final PropertyThis is a list of all action or destination triggers contained within a given PDF.static final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertyContains at least one damaged font for at least one characterstatic final PropertyContains at least one font that is not embeddedstatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertyIf the file came from an annotation and there was a typestatic final Propertystatic final Propertyliteral string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime typestatic final PropertyNumber of %%EOF as extracted by the StartXRefScanner.static final PropertyIf the PDF has an annotation of type 3Dstatic final PropertyHas > 0 AcroForm fieldsstatic final PropertyHas a collection element in the root.static final Propertystatic final PropertyHas XFAstatic final PropertyHas XMP, whether or not it is validstatic final Propertystatic final PropertyThis is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc.static final Propertystatic final PropertyNumber of 3D annotations a PDF contains.static final PropertyThis counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings.static final Propertystatic final Stringstatic final StringPrefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)static final Propertystatic final PropertyIncremental updates as extracted by the StartXRefScanner.static final Stringstatic final Propertystatic final Stringstatic final Propertystatic final Propertystatic final Propertystatic final Stringstatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertyIf xmp is extracted by, e.g. the XMLProfiler, where did it come from?
-
Field Details
-
PDF_PREFIX
- See Also:
-
PDFA_PREFIX
- See Also:
-
PDFAID_PREFIX
- See Also:
-
EOF_OFFSETS
Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF. -
PDF_DOC_INFO_PREFIX
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)- See Also:
-
PDF_DOC_INFO_CUSTOM_PREFIX
- See Also:
-
DOC_INFO_CREATED
-
DOC_INFO_CREATOR
-
DOC_INFO_CREATOR_TOOL
-
DOC_INFO_MODIFICATION_DATE
-
DOC_INFO_KEY_WORDS
-
DOC_INFO_PRODUCER
-
DOC_INFO_SUBJECT
-
DOC_INFO_TITLE
-
DOC_INFO_TRAPPED
-
PDF_VERSION
-
PDFA_VERSION
-
PDF_EXTENSION_VERSION
-
PDFAID_CONFORMANCE
-
PDFAID_PART
-
PDFUAID_PART
-
PDFVT_VERSION
-
PDFVT_MODIFIED
-
PDFXID_VERSION
-
PDFX_VERSION
-
PDFX_CONFORMANCE
-
ILLUSTRATOR_TYPE
-
IS_ENCRYPTED
-
PRODUCER
-
ACTION_TRIGGER
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF. -
ACTION_TRIGGERS
This is a list of all action or destination triggers contained within a given PDF. -
ACTION_TYPES
-
CHARACTERS_PER_PAGE
-
UNMAPPED_UNICODE_CHARS_PER_PAGE
-
TOTAL_UNMAPPED_UNICODE_CHARS
-
OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
-
CONTAINS_DAMAGED_FONT
Contains at least one damaged font for at least one character -
CONTAINS_NON_EMBEDDED_FONT
Contains at least one font that is not embedded -
HAS_XFA
Has XFA -
HAS_XMP
Has XMP, whether or not it is valid -
XMP_LOCATION
If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or? -
HAS_ACROFORM_FIELDS
Has > 0 AcroForm fields -
HAS_MARKED_CONTENT
-
HAS_COLLECTION
Has a collection element in the root. If true, this is likely a PDF Portfolio. -
EMBEDDED_FILE_DESCRIPTION
-
EMBEDDED_FILE_ANNOTATION_TYPE
If the file came from an annotation and there was a type -
EMBEDDED_FILE_SUBTYPE
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type -
HAS_3D
If the PDF has an annotation of type 3D -
ANNOTATION_TYPES
-
ANNOTATION_SUBTYPES
-
NUM_3D_ANNOTATIONS
Number of 3D annotations a PDF contains. This makesHAS_3Dredundant. -
ASSOCIATED_FILE_RELATIONSHIP
-
INCREMENTAL_UPDATE_NUMBER
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser. -
PDF_INCREMENTAL_UPDATE_COUNT
Incremental updates as extracted by the StartXRefScanner. See that class for limitations. -
OCR_PAGE_COUNT
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will
-