Apache Tika 1.26

The most notable changes in Tika 1.26 over the previous release are:

  • Fix thread safety bug in OpenOffice parser (TIKA-3334).
  • The "writeLimit" header now pertains to the combined characters written per container document (and embedded documents) in the /rmeta endpoint in tika-server (TIKA-3325); it no longer functions only per container or embedded document.
  • Extract more embedded files in PDFs by recursively processing the embedded file tree (TIKA-3332).
  • Allow for case insensitive headers for configuration of the PDFParser and the TesseractOCRParser in tika-server via Subhajit Das (TIKA-3320).
  • Improve detection and parsing of XPS files (TIKA-3316).
  • General dependency upgrades (TIKA-3244).
  • Great optimization in ForkParser (TIKA-3237).
  • Fix parsing of emails attached to other emails in PST files (TIKA-3004).
  • MP3 parser should output the xmpDM:duration metadata as seconds notmilliseconds, consistent with the other Audio and Video parsers (TIKA-3318).

The following people have contributed to Tika 1.26 by submitting or commenting on the issues resolved in this release:

  • Andrew Pavlin
  • Bertrand Caron
  • Julien Massiera
  • Nick Burch
  • Nick Harmer
  • Peter Kronenberg
  • Ross Johnson
  • Subhajit Das
  • Tilman Hausherr
  • Tim Allison

See https://s.apache.org/yjp3v for more details on these contributions.