Security

The following is an incomplete list of known and fixed Critical Vulnerabilities and Exposures (CVEs) and other vulnerabilities in Apache Tika or its dependencies. Please help us fill this in with more details.

CVE or Vulnerability Description Reporter Affected Versions
CVE-2023-42503 commons-compress uncontrolled resource consumption vulnerability while parsing tar files ??? ???->2.9.0
CVE-2022-33879 Regex DoS in StandardsExtractingContentHandler; incomplete fix for CVE-2022-30973/CVE-2022-30216 and a new one Tony Torralba, Jaroslav Lobačevski and Tim Allison ???-2.4.0 and ???-1.28.3
CVE-2022-30973 Regex DoS in StandardsExtractingContentHandler; missed fix in 1.28.2 Cathy Hu, SUSE Software Solutions Germany GmbH ???-1.28.2
CVE-2022-25169 BPGParser Memory Usage DoS ??? ???-2.3.0 and ???-1.28.1
CVE-2022-30216 Regex DoS in StandardsExtractingContentHandler CodeQL team members Tony Torralba and Joseph Farebrother ???-2.3.0 and ???-1.28.1
CVE-2021-44832 Remote Code Execution via JDBC Appender in log4j2 ??? 2.0.0-BETA-2.2.1
CVE-2021-44228 Critical Remote Code Execution in log4j2 ??? 2.0.0-BETA-2.1.0
CVE-2021-31812 Infinite loop when loading a crafted PDF in PDFBox before 2.0.24 Chaoyuan Peng ?-1.26
CVE-2021-31811 OutOfMemoryException when loading a crafted PDF in PDFBox before 2.0.24 Chaoyuan Peng ?-1.26
CVE-2021-28657 Infinite loop in the MP3Parser. Khaled Nassar ?-1.25
CVE-2021-27906 Out of memory error while loading a file in PDFBox before 2.0.23. Fabian Meumertzheim ?-1.25
CVE-2021-27807 Infinite loop while loading a file in PDFBox before 2.0.23. Fabian Meumertzheim ?-1.25
CVE-2020-9489 System.exit vulnerability in Tika's OneNote Parser; out of memory errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser, SAS7BDATParser, OneNoteParser and ImageParser. Tim Allison 1.0-1.24
CVE-2020-1950 Excessive memory usage (DoS) vulnerability in Apache Tika's PSDParser Pierre Ernst 1.0-1.23
CVE-2020-1951 Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser Tim Allison 1.0-1.23
CVE-2019-10094 StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper Tim Allison; files contributed by Matthew Barber and Erling Ellingsen 1.7-1.21
CVE-2019-10093 Denial of Service in Apache Tika's 2003ml and 2006ml Parsers Tim Allison 1.19-1.21
CVE-2019-10088 OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper RunningSnail 1.7-1.21
PDFBOX-4550 OOM from corrupt ToUnicode stream in PDFs Tilman Hausherr ?-1.21
CVE-2019-0228 XML External Entity (XXE) in xfdf loading in PDFBox (regular Tika parsing would likely not be vulnerable) Kurt Boberg ?-1.20
CVE-2018-20346 (Provided) SQLite before 3.52.3 allows remote attackers to execute arbitrary code Pat Cashman (notified Tika team) ?-1.20
CVE-2018-17197 Infinite Loop in Tika's SQLite3Parser Tim Allison 1.8-1.19.1
CVE-2018-11796 XML Entity Expansion in Tika's SAXParsers after reset() Slava Gorelik ?-1.19
CVE-2018-11797 Very long loop parsing page tree in PDFBox Shawn Rasheed and Jens Dietrich ?-1.19
CVE-2018-11771 Infinite Loop in Commons-Compress ZipArchiveInputStream Tobias Ospelt ?-1.18
CVE-2018-8017 Infinite Loop in IptcAnpaParser Rohan Padhye and Tobias Ospelt 1.2-1.18
CVE-2018-8036 Infinite Loop leading to OOM in PDFBox's AFMParser Tobias Ospelt ?-1.18
CVE-2018-12418 Infinite Loop in junrar Tobias Ospelt ?-1.18
CVE-2018-11761 XML Entity Expansion Vulnerability Renfei (Brian) Wang 0.1-1.18
CVE-2018-11762 Rare Zip Slip Vulnerability in tika-app Tim Allison 0.9-1.18
RIFFReader Infinite Loop in AudioParser in Java 8 and 9 Sergey Bylokhov and Tobias Ospelt ?-1.18
TIKA-2446 OOM detecting OPCPackage files with corrupt ZIP Thorsten Schäfer ?-1.18
PDFBOX-4014 Infinite loop in JBig2 (versions less than 3.0.0) Hanno Böck (if user supplied) ?-1.17
CVE-2018-1339 Infinite loop in ChmParser Tobias Ospelt ?-1.17
CVE-2018-1338 Infinite loop in BPGParser Tobias Ospelt ?-1.17
CVE-2018-1335 Command Execution in tika-server Tim Allison ?-1.17
CVE-2017-12626 Apache POI - Infinite loops in WMF, EMF, MSG and macros; OOMs in DOC, PPT and XLS Tim Allison, Luís Filipe Nassif and Jerome Lacoste ?-1.17
CVE-2018-1324 and COMPRESS-432 Commons Compress - Infinite loop in ZipFile Luís Filipe Nassif and Anton Abashkin ?-1.17
CVE-2018-7489 and TIKA-2634 Jackson - Deserialization vulnerability Richard Cyganiak (notified Tika team) ?-1.17
PDFBOX-3919 Apache PDFBox - Infinite loop Hanno Böck and Andreas Bogk ?-1.16
TIKA-2115 Apache POI - OOM parsing OLE object Thomas Galla ?-1.15
COMPRESS-382 Commons Compress - OOM detecting corrupt LZMA Luís Filipe Nassif ?-1.15
COMPRESS-386 and TIKA-1631 Commons Compress - OOM detecting corrupt x-compress Pavel Micka ?-1.15
TIKA-2045 and TIKA-3442 Apache PDFBox - OOM in font caching Egbert ?-1.13
TIKA-1866 and TIKA-954 Apache POI - OOM in DOCX and PPTX because of bug in Piccolo parser Rob Tulloh and Shawn Johnson ?-1.13
TIKA-2040 GC-Overload and OOM in CHMParser Luís Filipe Nassif ?-1.13
CVE-2016-6809 jmatio - Deserialization Vulnerability in MATLAB parser Pierre Ernst 1.6-1.13
CVE-2016-4434 XXE Vulnerability in several parsers Arthur Khashaev, Seulgi Kim, Mesut Timur (and Tim Allison while remediating initial issue reported by Arthur et al.) 0.10-1.12
CVE-2016-2175 XML External Entity (XXE) in PDFBox ??? ?-1.12
CVE-2015-3271 Remote Access to host files via tika-server Tim Allison 1.9?-1.10
PDFBOX-2811 Apache PDFBox - Infinite Loop Andreas Lehmkühler ?-1.10
PDFBOX-2200 Apache PDFBox - Slowly building memory leak because of static caching of fonts Matthew Buckett ?-1.6
TIKA-1471 Apache PDFBox - OOM with corrupt PDF Alan Burlison ?-1.6
TIKA-788 Infinite Loop in DWG Stas Shaposhnikov ?-1.4?
TIKA-1132 Apache POI - Nearly Infinite Loop in XLS Ryan Krueger ?-1.4
TIKA-1179 Infinite Loop in corrupt MP3 Marius Dumitru Florea ?-1.4
TIKA-866 OOM reading Tika config file Stephan Mühlstrasser ?-1.1

Third party vulnerabilities that may or may not be triggerable via regular use of Apache Tika.

CVE or Vulnerability Description Reporter Affected Versions
CVE-2018-10237 Unbounded memory allocation in Google Guava Pat Cashman (notified Tika team) ?-1.20
CVE-2018-19362 FaxterXML jackson-databind may allow attackers to have unspecified impact from polymorphic deserialization Pat Cashman (notified Tika team) ?-1.20

Acronyms and Terms

  • Command Execution -- A malicious client could execute anything on tika-server's commandline
  • Deserialization Vulnerability -- OWASP's Cheat Sheet. A malicious actor could run arbitrary code on your computer.
  • OOM -- Out of Memory Error -- Parsers may allocate more memory than is available. This can sometimes be caused by parsers not performing sanity checks before allocation. See, for example: TIKA-1631
  • XXE -- XML External Entity Processing A malicious client could access data on your system.