Apache Tika 3.0.0-BETA
The most notable changes in Tika 3.0.0 over the previous release are:
- BREAKING CHANGES
- Require Java 11 (TIKA-4128).
- The boilerpipe handler has been moved to the tika-handler-boiler-pipe package (TIKA-4138).
- We've migrated HTML parsing to the JSoup parser instead of TagSoup. If you have a custom configuration on the HTMLParser, you'll need to change that to o.a.t.p.html.JSoupParser (TIKA-1599).
- Removed xerces2 as a dependency (TIKA-4135).
- Tika will look for "custom-mimetypes.xml" directly on the classpath, NOT under "/org/apache/tika/mime/". (TIKA-4147). Other Changes/Updates
- Upgrade to PDFBox 3.0.1 (TIKA-3347)
- Deprecated AbstractParser for removal in 4.x (TIKA-4132).
- Fix bug in DateUtils that stripped timezone information fromincoming Calendar objects (TIKA-4126).
The following people have contributed to Tika 3.0.0-BETA by submitting or commenting on the issues resolved in this release:
- Cassandra Xia
- Desmond David
- Florent Valdelievre
- Kenneth William Krugler
- Maxim Solodovnik
- NW Brad
- RaahulUmapathy
- Sandeep Kulkarni
- Thorsten Heit
- Tilman Hausherr
- Tim Allison
See https://s.apache.org/15jlf for more details on these contributions.