Metadata Changes in Tika 4.x
This document details the metadata key changes in Apache Tika 4.x.
Overview
Tika 4.x prefixes all "user generated" metadata keys to prevent overwrites and improve namespace clarity. This is a security-focused change that prevents user-controlled data from potentially overwriting existing metadata values in the Metadata object.
Metadata Key Changes
| Category | Change | Details |
|---|---|---|
HTML custom metadata |
Prefixed with |
Custom metadata from HTML documents now uses the |
MAPI metadata |
Prefix changed to |
Microsoft MAPI properties now use the |
Resource name |
Renamed |
|
Unrecognized image metadata |
Prefixed with |
Unrecognized image metadata keys now use the |
Office metadata |
Prefix changed |
Changed from |
Migration Steps
When upgrading to Tika 4.x, you will need to update any code that references metadata keys directly:
HTML Metadata
// Before (3.x)
String value = metadata.get("custom-key");
// After (4.x)
String value = metadata.get("html:custom-key");
MAPI Metadata
// Before (3.x)
String value = metadata.get("mapi:some-property");
// After (4.x) - prefix remains mapi: but verify specific keys
String value = metadata.get("mapi:some-property");
Resource Name
// Before (3.x)
String name = metadata.get("resourceName");
// After (4.x)
String name = metadata.get("X-TIKA:resourceName");