Apache Tika 1.12
The most notable changes in Tika 1.12 over the previous release are:
- Support for iFrames and element link extraction is provided inthe link Content Handler (TIKA-1835).
- Slide notes are now linked to the slide XHTML in the PPT output(TIKA-1840).
- JSON tests in Tika server were updated to remove impossible casts(Github-73).
- Fix bug in GeoTopicParser where NER is reused instead of instantiatedwith each request (TIKA-1834).
- 5.1 && Downgrade Rome dependency to 0.9 to avoidnasty NPE (TIKA-1820, TIKA-1516)
- The NamedEntityParser was enhanced to generate text contentin addition to metadata (TIKA-1815, TIKA-1816).
- A significant speed-up is made to the GeoTopicParser byusing the new REST server capabilities from Lucene GeoGazetteer (TIKA-1803).
- A parser to compute motion properties in Videos, e.g.,Histogram of Oriented Gradients and Histogram of Optical Flowsusing the Pooled Time Series algorithm, was added (TIKA-1798).
- Provide NamedEntityParser which exposes Named Entity Recognitionfrom OpenNLP and Stanford NER providers (TIKA-1787, Github-61,Github-62).
- Allow XHTMLContentHandler to pass attributes of html elementvia Markus Jelsma (TIKA-1782).
- Fix regression with spacing in PPT via Andreas Beeker (TIKA-1777).
- Tika Facade parse methods for Path and File added which take aMetadata object, to mirror the existing InputStream one (Github-60)
The following people have contributed to Tika 1.12 by submitting or commenting on the issues resolved in this release:
- Bob Paulin
- Chris A. Mattmann
- Ken Krugler
- Lewis John McGibbney
- Madhav Sharan
- Markus Jelsma
- Nick Burch
- Roberto Benedetti
- Thamme Gowda N
- Tim Allison
- Vjeran Marcinko
- Yueheng He
See https://s.apache.org/wDlx for more details on these contributions.