Package org.apache.tika.metadata
Interface PDF
-
public interface PDF
PDF properties collection.- Since:
- Apache Tika 1.14
-
-
Field Summary
Fields Modifier and Type Field Description static Property
ACTION_TRIGGER
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc.static Property
ACTION_TRIGGERS
This is a list of all action or destination triggers contained within a given PDF.static Property
ACTION_TYPES
static Property
ANNOTATION_SUBTYPES
static Property
ANNOTATION_TYPES
static Property
ASSOCIATED_FILE_RELATIONSHIP
static Property
CHARACTERS_PER_PAGE
static Property
CONTAINS_DAMAGED_FONT
Contains at least one damaged font for at least one characterstatic Property
CONTAINS_NON_EMBEDDED_FONT
Contains at least one font that is not embeddedstatic Property
DOC_INFO_CREATED
static Property
DOC_INFO_CREATOR
static Property
DOC_INFO_CREATOR_TOOL
static Property
DOC_INFO_KEY_WORDS
static Property
DOC_INFO_MODIFICATION_DATE
static Property
DOC_INFO_PRODUCER
static Property
DOC_INFO_SUBJECT
static Property
DOC_INFO_TITLE
static Property
DOC_INFO_TRAPPED
static Property
EMBEDDED_FILE_ANNOTATION_TYPE
If the file came from an annotation and there was a typestatic Property
EMBEDDED_FILE_DESCRIPTION
static Property
EMBEDDED_FILE_SUBTYPE
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime typestatic Property
EOF_OFFSETS
Number of %%EOF as extracted by the StartXRefScanner.static Property
HAS_3D
If the PDF has an annotation of type 3Dstatic Property
HAS_ACROFORM_FIELDS
Has > 0 AcroForm fieldsstatic Property
HAS_COLLECTION
Has a collection element in the root.static Property
HAS_MARKED_CONTENT
static Property
HAS_XFA
Has XFAstatic Property
HAS_XMP
Has XMP, whether or not it is validstatic Property
ILLUSTRATOR_TYPE
static Property
INCREMENTAL_UPDATE_NUMBER
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc.static Property
IS_ENCRYPTED
static Property
NUM_3D_ANNOTATIONS
Number of 3D annotations a PDF contains.static Property
OCR_PAGE_COUNT
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings.static Property
OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
static String
PDF_DOC_INFO_CUSTOM_PREFIX
static String
PDF_DOC_INFO_PREFIX
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)static Property
PDF_EXTENSION_VERSION
static Property
PDF_INCREMENTAL_UPDATE_COUNT
Incremental updates as extracted by the StartXRefScanner.static String
PDF_PREFIX
static Property
PDF_VERSION
static String
PDFA_PREFIX
static Property
PDFA_VERSION
static Property
PDFAID_CONFORMANCE
static Property
PDFAID_PART
static String
PDFAID_PREFIX
static Property
PDFUAID_PART
static Property
PDFVT_MODIFIED
static Property
PDFVT_VERSION
static Property
PDFX_CONFORMANCE
static Property
PDFX_VERSION
static Property
PDFXID_VERSION
static Property
PRODUCER
static Property
TOTAL_UNMAPPED_UNICODE_CHARS
static Property
UNMAPPED_UNICODE_CHARS_PER_PAGE
static Property
XMP_LOCATION
If xmp is extracted by, e.g.
-
-
-
Field Detail
-
PDF_PREFIX
static final String PDF_PREFIX
- See Also:
- Constant Field Values
-
PDFA_PREFIX
static final String PDFA_PREFIX
- See Also:
- Constant Field Values
-
PDFAID_PREFIX
static final String PDFAID_PREFIX
- See Also:
- Constant Field Values
-
EOF_OFFSETS
static final Property EOF_OFFSETS
Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF.
-
PDF_DOC_INFO_PREFIX
static final String PDF_DOC_INFO_PREFIX
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)- See Also:
- Constant Field Values
-
PDF_DOC_INFO_CUSTOM_PREFIX
static final String PDF_DOC_INFO_CUSTOM_PREFIX
- See Also:
- Constant Field Values
-
DOC_INFO_CREATED
static final Property DOC_INFO_CREATED
-
DOC_INFO_CREATOR
static final Property DOC_INFO_CREATOR
-
DOC_INFO_CREATOR_TOOL
static final Property DOC_INFO_CREATOR_TOOL
-
DOC_INFO_MODIFICATION_DATE
static final Property DOC_INFO_MODIFICATION_DATE
-
DOC_INFO_KEY_WORDS
static final Property DOC_INFO_KEY_WORDS
-
DOC_INFO_PRODUCER
static final Property DOC_INFO_PRODUCER
-
DOC_INFO_SUBJECT
static final Property DOC_INFO_SUBJECT
-
DOC_INFO_TITLE
static final Property DOC_INFO_TITLE
-
DOC_INFO_TRAPPED
static final Property DOC_INFO_TRAPPED
-
PDF_VERSION
static final Property PDF_VERSION
-
PDFA_VERSION
static final Property PDFA_VERSION
-
PDF_EXTENSION_VERSION
static final Property PDF_EXTENSION_VERSION
-
PDFAID_CONFORMANCE
static final Property PDFAID_CONFORMANCE
-
PDFAID_PART
static final Property PDFAID_PART
-
PDFUAID_PART
static final Property PDFUAID_PART
-
PDFVT_VERSION
static final Property PDFVT_VERSION
-
PDFVT_MODIFIED
static final Property PDFVT_MODIFIED
-
PDFXID_VERSION
static final Property PDFXID_VERSION
-
PDFX_VERSION
static final Property PDFX_VERSION
-
PDFX_CONFORMANCE
static final Property PDFX_CONFORMANCE
-
ILLUSTRATOR_TYPE
static final Property ILLUSTRATOR_TYPE
-
IS_ENCRYPTED
static final Property IS_ENCRYPTED
-
PRODUCER
static final Property PRODUCER
-
ACTION_TRIGGER
static final Property ACTION_TRIGGER
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF.
-
ACTION_TRIGGERS
static final Property ACTION_TRIGGERS
This is a list of all action or destination triggers contained within a given PDF.
-
ACTION_TYPES
static final Property ACTION_TYPES
-
CHARACTERS_PER_PAGE
static final Property CHARACTERS_PER_PAGE
-
UNMAPPED_UNICODE_CHARS_PER_PAGE
static final Property UNMAPPED_UNICODE_CHARS_PER_PAGE
-
TOTAL_UNMAPPED_UNICODE_CHARS
static final Property TOTAL_UNMAPPED_UNICODE_CHARS
-
OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
static final Property OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
-
CONTAINS_DAMAGED_FONT
static final Property CONTAINS_DAMAGED_FONT
Contains at least one damaged font for at least one character
-
CONTAINS_NON_EMBEDDED_FONT
static final Property CONTAINS_NON_EMBEDDED_FONT
Contains at least one font that is not embedded
-
HAS_XFA
static final Property HAS_XFA
Has XFA
-
HAS_XMP
static final Property HAS_XMP
Has XMP, whether or not it is valid
-
XMP_LOCATION
static final Property XMP_LOCATION
If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or?
-
HAS_ACROFORM_FIELDS
static final Property HAS_ACROFORM_FIELDS
Has > 0 AcroForm fields
-
HAS_MARKED_CONTENT
static final Property HAS_MARKED_CONTENT
-
HAS_COLLECTION
static final Property HAS_COLLECTION
Has a collection element in the root. If true, this is likely a PDF Portfolio.
-
EMBEDDED_FILE_DESCRIPTION
static final Property EMBEDDED_FILE_DESCRIPTION
-
EMBEDDED_FILE_ANNOTATION_TYPE
static final Property EMBEDDED_FILE_ANNOTATION_TYPE
If the file came from an annotation and there was a type
-
EMBEDDED_FILE_SUBTYPE
static final Property EMBEDDED_FILE_SUBTYPE
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type
-
HAS_3D
static final Property HAS_3D
If the PDF has an annotation of type 3D
-
ANNOTATION_TYPES
static final Property ANNOTATION_TYPES
-
ANNOTATION_SUBTYPES
static final Property ANNOTATION_SUBTYPES
-
NUM_3D_ANNOTATIONS
static final Property NUM_3D_ANNOTATIONS
Number of 3D annotations a PDF contains. This makesHAS_3D
redundant.
-
ASSOCIATED_FILE_RELATIONSHIP
static final Property ASSOCIATED_FILE_RELATIONSHIP
-
INCREMENTAL_UPDATE_NUMBER
static final Property INCREMENTAL_UPDATE_NUMBER
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser.
-
PDF_INCREMENTAL_UPDATE_COUNT
static final Property PDF_INCREMENTAL_UPDATE_COUNT
Incremental updates as extracted by the StartXRefScanner. See that class for limitations.
-
OCR_PAGE_COUNT
static final Property OCR_PAGE_COUNT
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will
-
-