Package org.apache.tika.metadata
Interface PDF
-
public interface PDFPDF properties collection.- Since:
- Apache Tika 1.14
-
-
Field Summary
Fields Modifier and Type Field Description static PropertyACTION_TRIGGERThis specifies where an action or destination would be found/triggered in the document: on document open, before close, etc.static PropertyACTION_TRIGGERSThis is a list of all action or destination triggers contained within a given PDF.static PropertyACTION_TYPESstatic PropertyANNOTATION_SUBTYPESstatic PropertyANNOTATION_TYPESstatic PropertyASSOCIATED_FILE_RELATIONSHIPstatic PropertyCHARACTERS_PER_PAGEstatic PropertyCONTAINS_DAMAGED_FONTContains at least one damaged font for at least one characterstatic PropertyCONTAINS_NON_EMBEDDED_FONTContains at least one font that is not embeddedstatic PropertyDOC_INFO_CREATEDstatic PropertyDOC_INFO_CREATORstatic PropertyDOC_INFO_CREATOR_TOOLstatic PropertyDOC_INFO_KEY_WORDSstatic PropertyDOC_INFO_MODIFICATION_DATEstatic PropertyDOC_INFO_PRODUCERstatic PropertyDOC_INFO_SUBJECTstatic PropertyDOC_INFO_TITLEstatic PropertyDOC_INFO_TRAPPEDstatic PropertyEMBEDDED_FILE_ANNOTATION_TYPEIf the file came from an annotation and there was a typestatic PropertyEMBEDDED_FILE_DESCRIPTIONstatic PropertyEMBEDDED_FILE_SUBTYPEliteral string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime typestatic PropertyEOF_OFFSETSNumber of %%EOF as extracted by the StartXRefScanner.static PropertyHAS_3DIf the PDF has an annotation of type 3Dstatic PropertyHAS_ACROFORM_FIELDSHas > 0 AcroForm fieldsstatic PropertyHAS_COLLECTIONHas a collection element in the root.static PropertyHAS_MARKED_CONTENTstatic PropertyHAS_XFAHas XFAstatic PropertyHAS_XMPHas XMP, whether or not it is validstatic PropertyILLUSTRATOR_TYPEstatic PropertyINCREMENTAL_UPDATE_NUMBERThis is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc.static PropertyIS_ENCRYPTEDstatic PropertyNUM_3D_ANNOTATIONSNumber of 3D annotations a PDF contains.static PropertyOCR_PAGE_COUNTThis counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings.static PropertyOVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARSstatic StringPDF_DOC_INFO_CUSTOM_PREFIXstatic StringPDF_DOC_INFO_PREFIXPrefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)static PropertyPDF_EXTENSION_VERSIONstatic PropertyPDF_INCREMENTAL_UPDATE_COUNTIncremental updates as extracted by the StartXRefScanner.static StringPDF_PREFIXstatic PropertyPDF_VERSIONstatic StringPDFA_PREFIXstatic PropertyPDFA_VERSIONstatic PropertyPDFAID_CONFORMANCEstatic PropertyPDFAID_PARTstatic StringPDFAID_PREFIXstatic PropertyPDFUAID_PARTstatic PropertyPDFVT_MODIFIEDstatic PropertyPDFVT_VERSIONstatic PropertyPDFX_CONFORMANCEstatic PropertyPDFX_VERSIONstatic PropertyPDFXID_VERSIONstatic PropertyPRODUCERstatic PropertyTOTAL_UNMAPPED_UNICODE_CHARSstatic PropertyUNMAPPED_UNICODE_CHARS_PER_PAGEstatic PropertyXMP_LOCATIONIf xmp is extracted by, e.g. the XMLProfiler, where did it come from?
-
-
-
Field Detail
-
PDF_PREFIX
static final String PDF_PREFIX
- See Also:
- Constant Field Values
-
PDFA_PREFIX
static final String PDFA_PREFIX
- See Also:
- Constant Field Values
-
PDFAID_PREFIX
static final String PDFAID_PREFIX
- See Also:
- Constant Field Values
-
EOF_OFFSETS
static final Property EOF_OFFSETS
Number of %%EOF as extracted by the StartXRefScanner. See that class for limitations. This includes the final %%EOF, which may or may not be at the literal end of the file. This does not include an %%EOF if the startxref=0, as would happen in a dummy %%EOF in a linearized PDF.
-
PDF_DOC_INFO_PREFIX
static final String PDF_DOC_INFO_PREFIX
Prefix to be used for properties that record what was stored in the docinfo section (as opposed to XMP)- See Also:
- Constant Field Values
-
PDF_DOC_INFO_CUSTOM_PREFIX
static final String PDF_DOC_INFO_CUSTOM_PREFIX
- See Also:
- Constant Field Values
-
DOC_INFO_CREATED
static final Property DOC_INFO_CREATED
-
DOC_INFO_CREATOR
static final Property DOC_INFO_CREATOR
-
DOC_INFO_CREATOR_TOOL
static final Property DOC_INFO_CREATOR_TOOL
-
DOC_INFO_MODIFICATION_DATE
static final Property DOC_INFO_MODIFICATION_DATE
-
DOC_INFO_KEY_WORDS
static final Property DOC_INFO_KEY_WORDS
-
DOC_INFO_PRODUCER
static final Property DOC_INFO_PRODUCER
-
DOC_INFO_SUBJECT
static final Property DOC_INFO_SUBJECT
-
DOC_INFO_TITLE
static final Property DOC_INFO_TITLE
-
DOC_INFO_TRAPPED
static final Property DOC_INFO_TRAPPED
-
PDF_VERSION
static final Property PDF_VERSION
-
PDFA_VERSION
static final Property PDFA_VERSION
-
PDF_EXTENSION_VERSION
static final Property PDF_EXTENSION_VERSION
-
PDFAID_CONFORMANCE
static final Property PDFAID_CONFORMANCE
-
PDFAID_PART
static final Property PDFAID_PART
-
PDFUAID_PART
static final Property PDFUAID_PART
-
PDFVT_VERSION
static final Property PDFVT_VERSION
-
PDFVT_MODIFIED
static final Property PDFVT_MODIFIED
-
PDFXID_VERSION
static final Property PDFXID_VERSION
-
PDFX_VERSION
static final Property PDFX_VERSION
-
PDFX_CONFORMANCE
static final Property PDFX_CONFORMANCE
-
ILLUSTRATOR_TYPE
static final Property ILLUSTRATOR_TYPE
-
IS_ENCRYPTED
static final Property IS_ENCRYPTED
-
PRODUCER
static final Property PRODUCER
-
ACTION_TRIGGER
static final Property ACTION_TRIGGER
This specifies where an action or destination would be found/triggered in the document: on document open, before close, etc. This is included in the embedded document (js only for now?), not the container PDF.
-
ACTION_TRIGGERS
static final Property ACTION_TRIGGERS
This is a list of all action or destination triggers contained within a given PDF.
-
ACTION_TYPES
static final Property ACTION_TYPES
-
CHARACTERS_PER_PAGE
static final Property CHARACTERS_PER_PAGE
-
UNMAPPED_UNICODE_CHARS_PER_PAGE
static final Property UNMAPPED_UNICODE_CHARS_PER_PAGE
-
TOTAL_UNMAPPED_UNICODE_CHARS
static final Property TOTAL_UNMAPPED_UNICODE_CHARS
-
OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
static final Property OVERALL_PERCENTAGE_UNMAPPED_UNICODE_CHARS
-
CONTAINS_DAMAGED_FONT
static final Property CONTAINS_DAMAGED_FONT
Contains at least one damaged font for at least one character
-
CONTAINS_NON_EMBEDDED_FONT
static final Property CONTAINS_NON_EMBEDDED_FONT
Contains at least one font that is not embedded
-
HAS_XFA
static final Property HAS_XFA
Has XFA
-
HAS_XMP
static final Property HAS_XMP
Has XMP, whether or not it is valid
-
XMP_LOCATION
static final Property XMP_LOCATION
If xmp is extracted by, e.g. the XMLProfiler, where did it come from? The document's document catalog or a specific page...or?
-
HAS_ACROFORM_FIELDS
static final Property HAS_ACROFORM_FIELDS
Has > 0 AcroForm fields
-
HAS_MARKED_CONTENT
static final Property HAS_MARKED_CONTENT
-
HAS_COLLECTION
static final Property HAS_COLLECTION
Has a collection element in the root. If true, this is likely a PDF Portfolio.
-
EMBEDDED_FILE_DESCRIPTION
static final Property EMBEDDED_FILE_DESCRIPTION
-
EMBEDDED_FILE_ANNOTATION_TYPE
static final Property EMBEDDED_FILE_ANNOTATION_TYPE
If the file came from an annotation and there was a type
-
EMBEDDED_FILE_SUBTYPE
static final Property EMBEDDED_FILE_SUBTYPE
literal string from the PDEmbeddedFile#getSubtype(), should be what the PDF alleges is the embedded file's mime type
-
HAS_3D
static final Property HAS_3D
If the PDF has an annotation of type 3D
-
ANNOTATION_TYPES
static final Property ANNOTATION_TYPES
-
ANNOTATION_SUBTYPES
static final Property ANNOTATION_SUBTYPES
-
NUM_3D_ANNOTATIONS
static final Property NUM_3D_ANNOTATIONS
Number of 3D annotations a PDF contains. This makesHAS_3Dredundant.
-
ASSOCIATED_FILE_RELATIONSHIP
static final Property ASSOCIATED_FILE_RELATIONSHIP
-
INCREMENTAL_UPDATE_NUMBER
static final Property INCREMENTAL_UPDATE_NUMBER
This is a zero-based number for incremental updates within a PDF -- 0 is the first update, 1 is the second, etc. The final version of the PDF (e.g. the last update) does not have an incremental update number. This value is populated with the parse incremental updates feature is selected in the PDFParser.
-
PDF_INCREMENTAL_UPDATE_COUNT
static final Property PDF_INCREMENTAL_UPDATE_COUNT
Incremental updates as extracted by the StartXRefScanner. See that class for limitations.
-
OCR_PAGE_COUNT
static final Property OCR_PAGE_COUNT
This counts the number of pages that would have been OCR'd or were OCR'd depending on the OCR settings. If NO_OCR is selected, this will
-
-