Apache Tika 3.3.1

The most notable changes in Tika 3.3.1 over the previous release are:

  • Dependency upgrades (TIKA-4695).
  • Use the IANA-registered text/markdown as the primary media type, with text/x-web-markdown and text/x-markdown kept as aliases (TIKA-4724).
  • Fix several potential resource/file-handle leaks across the OOXML and ODF parsers, the HTTP fetcher, the gRPC server, and ForkClient (TIKA-4704).
  • The resourceName of a nested tarball no longer includes the parent directories of its parent gzip file, and fix a typo in the .gz extension (TIKA-4705).

The following people have contributed to Tika 3.3.1 by submitting or commenting on the issues resolved in this release:

  • Chengxin Xu
  • Alexander Veit
  • Lewis John McGibbney
  • Tilman Hausherr
  • Tim Allison

See https://s.apache.org/yg0ef for more details on these contributions.