Package org.apache.tika.metadata
Interface PDF
public interface PDF
PDF properties collection.
- Since:
- Apache Tika 1.14
-
Field Summary
Modifier and TypeFieldDescriptionstatic final Property
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc.static final Property
This is a list of all action or destination triggers contained within a given PDF.static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
Contains at least one damaged font for at least one characterstatic final Property
Contains at least one font that is not embeddedstatic final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
If the file came from an annotation and there was a typestatic final Property
static final Property
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime typestatic final Property
Number of %%EOF as extracted by the StartXRefScanner.static final Property
If the PDF has an annotation of type 3Dstatic final Property
Has > 0 AcroForm fieldsstatic final Property
Has a collection element in the root.static final Property
static final Property
Has XFAstatic final Property
Has XMP, whether or not it is validstatic final Property
static final Property
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc.static final Property
static final Property
Number of 3D annotations a PDF contains.static final Property
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings.static final Property
static final String
static final String
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)static final Property
static final Property
Incremental updates as extracted by the StartXRefScanner.static final String
static final Property
static final String
static final Property
static final Property
static final Property
static final String
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
If xmp is extracted by, e.g. the XMLProfiler, where did it come from?
-
Field Details
-
PDF_PREFIX
- See Also:
-
PDFA_PREFIX
- See Also:
-
PDFAID_PREFIX
- See Also:
-
EOF_OFFSETS
Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF. -
PDF_DOC_INFO_PREFIX
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)- See Also:
-
PDF_DOC_INFO_CUSTOM_PREFIX
- See Also:
-
DOC_INFO_CREATED
-
DOC_INFO_CREATOR
-
DOC_INFO_CREATOR_TOOL
-
DOC_INFO_MODIFICATION_DATE
-
DOC_INFO_KEY_WORDS
-
DOC_INFO_PRODUCER
-
DOC_INFO_SUBJECT
-
DOC_INFO_TITLE
-
DOC_INFO_TRAPPED
-
PDF_VERSION
-
PDFA_VERSION
-
PDF_EXTENSION_VERSION
-
PDFAID_CONFORMANCE
-
PDFAID_PART
-
PDFUAID_PART
-
PDFVT_VERSION
-
PDFVT_MODIFIED
-
PDFXID_VERSION
-
PDFX_VERSION
-
PDFX_CONFORMANCE
-
ILLUSTRATOR_TYPE
-
IS_ENCRYPTED
-
PRODUCER
-
ACTION_TRIGGER
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF. -
ACTION_TRIGGERS
This is a list of all action or destination triggers contained within a given PDF. -
ACTION_TYPES
-
CHARACTERS_PER_PAGE
-
UNMAPPED_UNICODE_CHARS_PER_PAGE
-
TOTAL_UNMAPPED_UNICODE_CHARS
-
OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
-
CONTAINS_DAMAGED_FONT
Contains at least one damaged font for at least one character -
CONTAINS_NON_EMBEDDED_FONT
Contains at least one font that is not embedded -
HAS_XFA
Has XFA -
HAS_XMP
Has XMP, whether or not it is valid -
XMP_LOCATION
If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or? -
HAS_ACROFORM_FIELDS
Has > 0 AcroForm fields -
HAS_MARKED_CONTENT
-
HAS_COLLECTION
Has a collection element in the root. If true, this is likely a PDF Portfolio. -
EMBEDDED_FILE_DESCRIPTION
-
EMBEDDED_FILE_ANNOTATION_TYPE
If the file came from an annotation and there was a type -
EMBEDDED_FILE_SUBTYPE
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type -
HAS_3D
If the PDF has an annotation of type 3D -
ANNOTATION_TYPES
-
ANNOTATION_SUBTYPES
-
NUM_3D_ANNOTATIONS
Number of 3D annotations a PDF contains. This makesHAS_3D
redundant. -
ASSOCIATED_FILE_RELATIONSHIP
-
INCREMENTAL_UPDATE_NUMBER
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser. -
PDF_INCREMENTAL_UPDATE_COUNT
Incremental updates as extracted by the StartXRefScanner. See that class for limitations. -
OCR_PAGE_COUNT
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will
-