Apache Tika 2.2.0
The most notable changes in Tika 2.2.0 over the previous release are:
- Add support for OneNote files downloaded from O365 (TIKA-3446).
- Fix logic bug in PipesServer that prevented concatenation of content from attachments (TIKA-3609).
- Improve extraction of embedded files from MSOffice files created by non-Microsoft tools (TIKA-3526).
- Added back ability to ignore load errors in TikaConfig (TIKA-3575).
- Make SecureContentHandler and other parameters configurable in AutoDetectParser programmatically and via tika-config.xml (TIKA-3594).
- Fix default logging in tika-app in batch mode (TIKA-3589).
- Fix bug that prevented specifying a config with the long --config= option in tika-app in batch mode (TIKA-3589).
- Fix thread starvation after numerous restarts in PipesClient (TIKA-3588).
- Fix race condition when starting multiple forked servers on multiple ports (TIKA-3586).
- Add timeout per task to be configured via headers for tika-server's legacy endpoints /tika and /rmeta. Note that this timeout greater than taskTimeoutMillis (TIKA-3582).
- Add metadata item for whether or not a PDF has a collection/is a Portfolio PDF (TIKA-3579).
- Add detection of ESRI Layer files (TIKA-3570).
- Add detection of JPEG XL, MARC, ICC profiles, NES-ROM file types(TIKA-3562 and TIKA-3563)
- Remove duplicate "subject" metadata keys that were intended for backwards compatibility with 1.x only (TIKA-3564).
- Fix Open Office mime types to be subclasses of application/zipand no longer require OPCPackageDetector-last ordering of zipdetectors (TIKA-3556).
- Improve robustness and features of the httpfetcher (TIKA-3543)
- Add optional fetch ranges to FetchEmitTuple to allow range fetching from,e.g. http or s3 (TIKA-3542).
The following people have contributed to Tika 2.2.0 by submitting or commenting on the issues resolved in this release:
- Abha
- Andreas Hubold
- August Valera
- César Soto Valero
- dataminer.accolade
- David Brosius
- Laura Delmaestro
- Lewis John McGibbney
- Luís Filipe Nassif
- Robin Schimpf
- Sebastian Nagel
- Tim Allison
See https://s.apache.org/0pfp7 for more details on these contributions.