Apache Tika 2.7.0

The most notable changes in Tika 2.7.0 over the previous release are:

  • Add SVG detection for svg files that lack the xml header (TIKA-3308).
  • Migrate to a live fork of Universal Charset Detector (TIKA-3213).
  • Improve handling of text-based attachments inside .eml files (TIKA-3959).
  • Add tika-parser-nlp-package to release artifacts (TIKA-3958).
  • Remove need for params/ element in classes that extend ConfigBase (TIKA-3946).
  • Add X-TIKA:embedded_id_path to ensure unique embedded file paths (TIKA-3942).
  • Fix bug that prevented digests when the fallback/EmptyParserwas called (TIKA-3939).
  • Remove log4j 1.2.x (and slf4j-log4j12 which now redirects to slf4j-reload4j) fromall modules (TIKA-3935).
  • Upgrade mime4j to 0.8.9 (TIKA-3950).
  • Refactor date parsing for emails (TIKA-3957)
  • Upgrade to Bouncy Castle 1.71 and jdk18on jars (TIKA-3933).
  • Add a JDBCPipesReporter (TIKA-3931).
  • Add multivalued field strategy option in jdbc-emitter (TIKA-3930).Default is now 'concatenate' with ', ' as the delimiter.

The following people have contributed to Tika 2.7.0 by submitting or commenting on the issues resolved in this release:

  • Anant Dahiya
  • Anas Hammani
  • Gregory Lepore
  • Joseph Goh
  • Julien Massiera
  • Konstantin Gribov
  • Tilman Hausherr
  • Tim Allison
  • Valery Yatsynovich
  • Yury Kats

See https://s.apache.org/ys2y0 for more details on these contributions.