Apache Tika 4.0.0-alpha-1

The most notable changes in Tika 4.0.0-alpha-1 over the previous release are:

Breaking Changes

  • Move from xml to json based configuration (TIKA-4544 and many others).
  • tika-pipes implementation modules have been reorganized by resource (tika-pipes-solr) vs task (tika-pipes-fetcher-solr) (TIKA-4543). Note that the file-system pipes components have been taken out of tika-pipes-core and placed in their own pf4j module: tika-pipes-file-system.
  • tika-pipes implementation modules are now pf4j plugins (TIKA-4519).
  • tika-pipes core classes have been moved to a new module: tika-pipes-core, and the FileSystem pipes components have moved (TIKA-4334).
  • MetadataListFilter has been renamed MetadataFilter, and MetadataFilter has been removed (TIKA-4546).
  • Removed several modules, including: tika-batch (TIKA-4333), snaps deployment (TIKA-4502), dotnet (TIKA-4332), advanced media module (TIKA-4500), tika-dl module (TIKA-4499), tika-fuzzing module (TIKA-4506).
  • Headers are no longer injected into the body/content of MSG files (TIKA-4345). Please open a ticket if you need this behavior across email formats.
  • API changes in the EmbeddedStreamTranslator (TIKA-4518).
  • Removed DigestingParser (TIKA-4607).
  • tika-parsers-standard-package is now a pom, not a jar. Users must add <type>pom</type> in Maven or @pom in Gradle (TIKA-4712).
  • Removed legacy ExternalParser; external parsers now require explicit JSON configuration (TIKA-4707).

Other Changes

  • Fix concurrency bug in TikaToXMP (TIKA-4393).

The following people have contributed to Tika 4.0.0-alpha-1 by submitting or commenting on the issues resolved in this release:

  • Aashish Tudu
  • Alexander Veit
  • Chengxin Xu
  • Claude Warren
  • David Frizelle
  • Eric Schoen
  • Francesco
  • Ghiles OUAREZKI
  • Grigorii Ioffe
  • Iachimoe
  • james
  • Justin Deoliveira
  • Klara Mazurak
  • Laura Delmaestro
  • Leszek Sliwko
  • Lewis John McGibbney
  • Manish S N
  • Matt Dutton
  • Nino Skopac
  • Olivier Ceulemans
  • Peter Hoogendijk
  • Pleeplop
  • Ruairidh Williamson
  • Sandeep Kulkarni
  • Sebastian Nagel
  • Stephen H
  • Steven Huypens
  • Subbu
  • Tiancheng Dai
  • Tilman Hausherr
  • Tim Allison
  • Tim Barrett
  • Tom Brisland
  • Valery Yatsynovich
  • V. S.

See https://s.apache.org/6ctu5 for more details on these contributions.