Security
The following is an incomplete list of known and fixed Critical Vulnerabilities and Exposures (CVEs) and other vulnerabilities in Apache Tika or its dependencies. Please help us fill this in with more details.
CVE or Vulnerability | Description | Reporter | Affected Versions |
CVE-2023-42503 | commons-compress uncontrolled resource consumption vulnerability while parsing tar files | ??? | ???->2.9.0 |
CVE-2022-33879 | Regex DoS in StandardsExtractingContentHandler; incomplete fix for CVE-2022-30973/CVE-2022-30216 and a new one | Tony Torralba, Jaroslav Lobačevski and Tim Allison | ???-2.4.0 and ???-1.28.3 |
CVE-2022-30973 | Regex DoS in StandardsExtractingContentHandler; missed fix in 1.28.2 | Cathy Hu, SUSE Software Solutions Germany GmbH | ???-1.28.2 |
CVE-2022-25169 | BPGParser Memory Usage DoS | ??? | ???-2.3.0 and ???-1.28.1 |
CVE-2022-30216 | Regex DoS in StandardsExtractingContentHandler | CodeQL team members Tony Torralba and Joseph Farebrother | ???-2.3.0 and ???-1.28.1 |
CVE-2021-44832 | Remote Code Execution via JDBC Appender in log4j2 | ??? | 2.0.0-BETA-2.2.1 |
CVE-2021-44228 | Critical Remote Code Execution in log4j2 | ??? | 2.0.0-BETA-2.1.0 |
CVE-2021-31812 | Infinite loop when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng | ?-1.26 |
CVE-2021-31811 | OutOfMemoryException when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng | ?-1.26 |
CVE-2021-28657 | Infinite loop in the MP3Parser. | Khaled Nassar | ?-1.25 |
CVE-2021-27906 | Out of memory error while loading a file in PDFBox before 2.0.23. | Fabian Meumertzheim | ?-1.25 |
CVE-2021-27807 | Infinite loop while loading a file in PDFBox before 2.0.23. | Fabian Meumertzheim | ?-1.25 |
CVE-2020-9489 | System.exit vulnerability in Tika's OneNote Parser; out of memory errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser, SAS7BDATParser, OneNoteParser and ImageParser. | Tim Allison | 1.0-1.24 |
CVE-2020-1950 | Excessive memory usage (DoS) vulnerability in Apache Tika's PSDParser | Pierre Ernst | 1.0-1.23 |
CVE-2020-1951 | Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser | Tim Allison | 1.0-1.23 |
CVE-2019-10094 | StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper | Tim Allison; files contributed by Matthew Barber and Erling Ellingsen | 1.7-1.21 |
CVE-2019-10093 | Denial of Service in Apache Tika's 2003ml and 2006ml Parsers | Tim Allison | 1.19-1.21 |
CVE-2019-10088 | OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper | RunningSnail | 1.7-1.21 |
PDFBOX-4550 | OOM from corrupt ToUnicode stream in PDFs | Tilman Hausherr | ?-1.21 |
CVE-2019-0228 | XML External Entity (XXE) in xfdf loading in PDFBox (regular Tika parsing would likely not be vulnerable) | Kurt Boberg | ?-1.20 |
CVE-2018-20346 | (Provided) SQLite before 3.52.3 allows remote attackers to execute arbitrary code | Pat Cashman (notified Tika team) | ?-1.20 |
CVE-2018-17197 | Infinite Loop in Tika's SQLite3Parser | Tim Allison | 1.8-1.19.1 |
CVE-2018-11796 | XML Entity Expansion in Tika's SAXParsers after reset() | Slava Gorelik | ?-1.19 |
CVE-2018-11797 | Very long loop parsing page tree in PDFBox | Shawn Rasheed and Jens Dietrich | ?-1.19 |
CVE-2018-11771 | Infinite Loop in Commons-Compress ZipArchiveInputStream | Tobias Ospelt | ?-1.18 |
CVE-2018-8017 | Infinite Loop in IptcAnpaParser | Rohan Padhye and Tobias Ospelt | 1.2-1.18 |
CVE-2018-8036 | Infinite Loop leading to OOM in PDFBox's AFMParser | Tobias Ospelt | ?-1.18 |
CVE-2018-12418 | Infinite Loop in junrar | Tobias Ospelt | ?-1.18 |
CVE-2018-11761 | XML Entity Expansion Vulnerability | Renfei (Brian) Wang | 0.1-1.18 |
CVE-2018-11762 | Rare Zip Slip Vulnerability in tika-app | Tim Allison | 0.9-1.18 |
RIFFReader | Infinite Loop in AudioParser in Java 8 and 9 | Sergey Bylokhov and Tobias Ospelt | ?-1.18 |
TIKA-2446 | OOM detecting OPCPackage files with corrupt ZIP | Thorsten Schäfer | ?-1.18 |
PDFBOX-4014 | Infinite loop in JBig2 (versions less than 3.0.0) | Hanno Böck | (if user supplied) ?-1.17 |
CVE-2018-1339 | Infinite loop in ChmParser | Tobias Ospelt | ?-1.17 |
CVE-2018-1338 | Infinite loop in BPGParser | Tobias Ospelt | ?-1.17 |
CVE-2018-1335 | Command Execution in tika-server | Tim Allison | ?-1.17 |
CVE-2017-12626 | Apache POI - Infinite loops in WMF, EMF, MSG and macros; OOMs in DOC, PPT and XLS | Tim Allison, Luís Filipe Nassif and Jerome Lacoste | ?-1.17 |
CVE-2018-1324 and COMPRESS-432 | Commons Compress - Infinite loop in ZipFile | Luís Filipe Nassif and Anton Abashkin | ?-1.17 |
CVE-2018-7489 and TIKA-2634 | Jackson - Deserialization vulnerability | Richard Cyganiak (notified Tika team) | ?-1.17 |
PDFBOX-3919 | Apache PDFBox - Infinite loop | Hanno Böck and Andreas Bogk | ?-1.16 |
TIKA-2115 | Apache POI - OOM parsing OLE object | Thomas Galla | ?-1.15 |
COMPRESS-382 | Commons Compress - OOM detecting corrupt LZMA | Luís Filipe Nassif | ?-1.15 |
COMPRESS-386 and TIKA-1631 | Commons Compress - OOM detecting corrupt x-compress | Pavel Micka | ?-1.15 |
TIKA-2045 and TIKA-3442 | Apache PDFBox - OOM in font caching | Egbert | ?-1.13 |
TIKA-1866 and TIKA-954 | Apache POI - OOM in DOCX and PPTX because of bug in Piccolo parser | Rob Tulloh and Shawn Johnson | ?-1.13 |
TIKA-2040 | GC-Overload and OOM in CHMParser | Luís Filipe Nassif | ?-1.13 |
CVE-2016-6809 | jmatio - Deserialization Vulnerability in MATLAB parser | Pierre Ernst | 1.6-1.13 |
CVE-2016-4434 | XXE Vulnerability in several parsers | Arthur Khashaev, Seulgi Kim, Mesut Timur (and Tim Allison while remediating initial issue reported by Arthur et al.) | 0.10-1.12 |
CVE-2016-2175 | XML External Entity (XXE) in PDFBox | ??? | ?-1.12 |
CVE-2015-3271 | Remote Access to host files via tika-server | Tim Allison | 1.9?-1.10 |
PDFBOX-2811 | Apache PDFBox - Infinite Loop | Andreas Lehmkühler | ?-1.10 |
PDFBOX-2200 | Apache PDFBox - Slowly building memory leak because of static caching of fonts | Matthew Buckett | ?-1.6 |
TIKA-1471 | Apache PDFBox - OOM with corrupt PDF | Alan Burlison | ?-1.6 |
TIKA-788 | Infinite Loop in DWG | Stas Shaposhnikov | ?-1.4? |
TIKA-1132 | Apache POI - Nearly Infinite Loop in XLS | Ryan Krueger | ?-1.4 |
TIKA-1179 | Infinite Loop in corrupt MP3 | Marius Dumitru Florea | ?-1.4 |
TIKA-866 | OOM reading Tika config file | Stephan Mühlstrasser | ?-1.1 |
Third party vulnerabilities that may or may not be triggerable via regular use of Apache Tika.
CVE or Vulnerability | Description | Reporter | Affected Versions |
CVE-2018-10237 | Unbounded memory allocation in Google Guava | Pat Cashman (notified Tika team) | ?-1.20 |
CVE-2018-19362 | FaxterXML jackson-databind may allow attackers to have unspecified impact from polymorphic deserialization | Pat Cashman (notified Tika team) | ?-1.20 |
Acronyms and Terms
- Command Execution -- A malicious client could execute anything on tika-server's commandline
- Deserialization Vulnerability -- OWASP's Cheat Sheet. A malicious actor could run arbitrary code on your computer.
- OOM -- Out of Memory Error -- Parsers may allocate more memory than is available. This can sometimes be caused by parsers not performing sanity checks before allocation. See, for example: TIKA-1631
- XXE -- XML External Entity Processing A malicious client could access data on your system.