Apache Tika 0.7

The most notable changes in Tika 0.7 over the previous release are:

  • MP3 file parsing was improved, including Channel and SampleRate extraction and ID3v2 support (TIKA-368, TIKA-372). Further, audio parsing mime detection was also improved for the MIDI format. (TIKA-199)
  • Tika no longer relies on X11 for its RTF parsing functionality. (TIKA-386)
  • A Thread-safe bug in the AutoDetectParser was discovered and addressed. (TIKA-374)
  • Upgrade to PDFBox 1.0.0. The new PDFBox version improves PDF parsing performance and fixes a number of text extraction issues. (TIKA-380)

The following people have contributed to Tika 0.7 by submitting or commenting on the issues resolved in this release:

  • Adam Rauch
  • Benson Margulies
  • Brett S.
  • Chris A. Mattmann
  • Daan de Wit
  • Dave Meikle
  • Durville
  • Ingo Renner
  • Jukka Zitting
  • Ken Krugler
  • Kenny Neal
  • Markus Goldbach
  • Maxim Valyanskiy
  • Nick Burch
  • Sami Siren
  • Uwe Schindler

See http://tinyurl.com/yklopby for more details on these contributions.