Apache Tika 1.24

The most notable changes in Tika 1.24 over the previous release are:

  • Upgrade Drew Noakes' metadata-extractor (TIKA-2952).
  • Enable optional extraction of structural tags in PDFs (alpha-grade) (TIKA-3026).
  • Tika app's --extract mode now outputs to STDOUT (TIKA-3035).
  • Add an optional Preflight parser for PDFs (TIKA-3055).
  • Improve detection of some zip-based formats (TIKA-3057).
  • Upgrade metadata-extractor to 2.13.0 (TIKA-2952).
  • Upgrade POI to 4.1.2 (TIKA-3047).
  • Extract XMP from PSD files (TIKA-3050).
  • Added XMLProfiler as an optional parser to profile XFA and XMPin PDFs (TIKA-3045).
  • Extract inline images that rely on the DCT filter from PDFs (TIKA-3041).
  • Upgrade PDFBox to 2.0.19 (TIKA-3033).
  • Fix bug in ASM parser configuration (TIKA-2992).
  • Upgrade java-libpst to 0.9.3 (TIKA-2546).

The following people have contributed to Tika 1.24 by submitting or commenting on the issues resolved in this release:

  • Aman Mishra * Arvind Jain * Carina Antunes * Clark Perkins * David Eric Pugh * David Pilato * Don * Jan Vlug * Jorge Spinsanti * Luís Filipe Nassif * Markus Mandalka * Michael Moritz * MRIT64 * Nick Burch * Richard Jones * Soren Daugaard * Steve * Syed Osama Anwer * Tilman Hausherr * Tim Allison * Zoltan Farago

See https://s.apache.org/xa01p for more details on these contributions.