Apache Tika 1.10

The most notable changes in Tika 1.10 over the previous release are:

  • Tika Config XML can now be used to create composite detectors, and exclude detectors that DefaultDetector would otherwise have used. This brings support in-line with Parsers. (TIKA-1702).
  • Reverted to legacy sort order of parsers that was mistakenly reversed in Tika 1.9 (TIKA-1689).
  • Upgrade to POI 3.13-beta1 (TIKA-1667).
  • Upgrade to PDFBox 1.8.10 (TIKA-1588).
  • MimeTypes now tries to find a registered type with and without parameters (TIKA-1692).
  • Added more robust error handling for encoding detection of .MSG files (TIKA-1238).
  • Fixed bug in Tika's use of the Jackcess parser that prevented reading of v97 Access files (TIKA-1681).
  • Upgrade xerial.org's sqlite-jdbc to 3.8.10.1. NOTE: as of Tika 1.9, this jar is "provided." Make sure to upgrade your provided jar! (TIKA-1687).
  • Add header/footer extraction to xls (via Aeham Abushwashi) (TIKA-1400).
  • Drop the source file name from the embedded file path in RecursiveParserWrapper's "X-TIKA:embedded_resource_path" (TIKA-1673).
  • Upgraded to Java 7 (TIKA-1536).
  • Non-standards compliant emails are now correctly detected as message/rfc822 (TIKA-1602).
  • Added parser for MS Access files via Jackcess. Many thanks to Health Market Science, Brian O'Neill and James Ahlborn for relicensing Jackcess to Apache v2! (TIKA-1601).
  • GDALParser now correctly sets "nitf" as a supported MediaType (TIKA-1664).
  • Added DigestingParser to calculate digest hashes and record them in metadata. Integrated with tika-app and tika-server (TIKA-1663).
  • Fixed ZipContainerDetector to detect all IPA files (TIKA-1659).

The following people have contributed to Tika 1.10 by submitting or commenting on the issues resolved in this release:

  • Aashish Chaudhary
  • Adam Estrada
  • Albert L.
  • Alessandro De Angelis
  • Andrew Jackson
  • Ann Burgess
  • Bin Hawking
  • Bob Paulin
  • Chris A. Mattmann
  • Chris Wilson
  • Daniel Bonniot de Ruisselet
  • David Warren
  • Filip Bednárik
  • Giuseppe Totaro
  • Jeremy B. Merrill
  • Johannes Mockenhaupt
  • Joseph North
  • Ken Krugler
  • Lewis John McGibbney
  • Markus Jelsma
  • Michael McCandless
  • Namrata Malarout
  • Nick Burch
  • Niels
  • Paul Ramirez
  • Paul Tunison
  • Rami Shomali
  • Ray Gauss II
  • Sergey Beryozkin
  • Tim Allison
  • Tyler Palsulich
  • jefferyyuan

See http://s.apache.org/EQ2 for more details on these contributions.