Apache Tika 1.25

The most notable changes in Tika 1.25 over the previous release are:

  • Fix inconsistent license in xmpcore (TIKA-3204).
  • General upgrades including some dependencies with recently found security vulnerabilities (TIKA-3119).
  • Add detection and a parser for flat ODF files (TIKA-3159).
  • Add extraction of macros from ODF files (TIKA-3161).
  • Add mime detection for hprof and hprof text files (TIKA-3144).
  • Add TextSignature and TextProfileSignature to tika-eval (TIKA-3145 and TIKA-3146)
  • Create a metadata filter to trigger tika-eval stats post parsing (TIKA-3140)
  • Add a configurable metadata-filter for the RecursiveParserWrapper (TIKA-3137)
  • Add status endpoint to tika-server (TIKA-3129).
  • Remove whitelist/blacklist terminology (TIKA-3120)
  • Add detection for parquet files (TIKA-3115).
  • Add detection and parsing for bplist (TIKA-3104).
  • Enable metadata value filtering for RecursiveParserWrapper (TIKA-3137)
  • Add a basic parser for plist files based on com.googlecode.plist:dd-plist (TIKA-3104).
  • Read hyperlinked images from ODT files (TIKA-3156).
  • Updated GrobidRESTParser to use new API location (TIKA-3191).
  • Add FileProfiler to tika-eval (TIKA-3216).
  • Add status endpoint to tika-server (TIKA-3129).
  • Improved handling of zip files with STORED entries with data descriptor (TIKA-3196).
  • Add parsers for XLZ, IDML and MIF (TIKA-2976, TIKA-3188 and TIKA-3189).
  • Add the beginnings of a format-aware fuzzing module (TIKA-3083).
  • Add wrapper for Linux 'file' command for mime detection (TIKA-3215).

The following people have contributed to Tika 1.25 by submitting or commenting on the issues resolved in this release:

  • Abhishek Chauhan
  • Akash
  • Bob Paulin
  • Carina Antunes
  • Christian Seipel
  • Clark Perkins
  • Daniel
  • Daniel Smyda
  • Darren Cooper
  • Dave Meikle
  • David Avendasora
  • Ip Smile
  • Isabelle Giguere
  • Jesper Håsteen
  • Nav
  • Nicholas DiPiazza
  • Parth
  • Peter Lee
  • Robert Kaulbach
  • Shayne Grant
  • Tim Allison
  • Trevor Bentley
  • wiwi

See https://s.apache.org/ipubr for more details on these contributions.