During a recent customer engagement, we encountered an interesting situation. The customer had raised concerns about a Java XXE (XML External Entity) vulnerability that had left their developers puzzled. Notably, their Static Application Security Testing (SAST) scans consistently identified this as a potential vulnerability. Given the complexity of the issue, our security consultancy team challenged me to dive deep into the issue and provide a clear and comprehensive explanation of this vulnerability to our customer. For a detailed, case-driven exploration of XXE vulnerabilities and their remediation in Java, refer to my specialized GitHub repository here.
This article marks the beginning of our efforts to demystify Java XXE vulnerabilities, with the goal of equipping developers and security teams with the knowledge to effectively address them.
In this insightful journey, we'll be deliberately utilizing the vulnerable Java repository (https://github.com/edu-secmachine/javulna) as our testing ground. This serves as an ideal platform to showcase the exploitation of a critical Java library vulnerability. Our exploration will be two-fold: initially, we'll employ Semgrep for a static analysis scan to identify the vulnerability, followed by a hands-on demonstration via terminal execution of a Java file, and we also will demonstrate remediation on a vulnerable code.
Seamlessly integrated into a Docker container, Semgrep allows for streamlined scanning of repositories. The command below initiates the scan, providing us with invaluable insights into potential weaknesses:
$docker run --rm -v SPWD:/src returntocorp/semgrep semgrep --config p/owasp-top-ten --json -o /src/semgrepscan-results.json
In this command, we specify the configuration as p/owasp-top-ten, focusing the scan on the top vulnerabilities commonly encountered.
We can see the XXE vulnerability listed inside the results, with additional information about the vulnerability such as “likelihood factor LOW”.
To deepen our understanding, let's examine the Semgrep scan results captured in the image below:
Here the “likelihood” in Semgrep indicates the probability that a detected issue is a true positive and not a false alarm. The screenshot above reveals detailed information about the detected vulnerabilities, including the XXE vulnerability we're particularly interested in. The likelihood factor is marked as “LOW”, inviting further inquiry.
The image below offers a comprehensive look at the code, the compiled binary file, and its terminal execution.
As seen in the output displayed on the terminal, the exploit successfully targets the vulnerability by invoking the /etc/passwd file. This action confirms that the code can indeed compromise the system, capturing sensitive data and outputting it directly in the terminal.
Another alarming facet of this vulnerability is its potential for Out of Band XML External Entity (OOB XXE) attacks. This form of attack allows the malicious code to send packets across the internet, laying the groundwork for more extensive exploits.
In our next experiment, we modify the initial code to utilize an external XML file. The image below vividly illustrates this adaptation, attaching the payload XML with a common website “webhook”.
By incorporating an external XML file, the exploit gains an added layer of complexity and versatility. This approach further broadens the attack vector, making it even more crucial to address this vulnerability.
Let's dissect the XML snippet responsible for the XXE attack and explore how each segment contributes to the compromise. The code snippet, presented below, employs a well-crafted XML structure to trick the XML parser.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd" >]> <foo>&xxe;</foo>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
DOCTYPE:The DOCTYPE declaration specifies that the document type is "foo" and sets the stage for defining entities used in the XML document. It sets up a framework for XML validation and for defining entities, such as “xxe” in this case.
[<!ENTITY xxe SYSTEM "<file:///etc/passwd>" >]is the crux of the XXE attack.
<!ENTITY xxe:Declares an entity named
SYSTEM "<file:///etc/passwd>": This entity fetches a system file using the
<file://>protocol, targeting the /etc/passwd file commonly used in Unix-based systems.
By executing this process, an inadequately secured XML parser will successfully retrieve the /etc/passwd file, leaking sensitive user account information.
With this understanding, it becomes abundantly clear why robust XML parser configurations to make the code secure are not just optional but essential in fortifying applications against XXE vulnerabilities. These configurations are considered as remediations which will be shown in the remediations part.
If the XML parser isn't rigorously configured to guard against XML External Entity (XXE) attacks, it becomes an unwitting accomplice in a security breach. By executing the malicious XML, the parser can access sensitive system files, such as /etc/passwd. This file is a treasure trove of sensitive information, typically housing user account details. An attacker can exploit this vulnerability to harvest this data.
In essence, the xmlData string acts as a trojan horse, using a craftily designed Document Type Definition (DTD) to deceive the XML parser into accessing the /etc/passwd file. This exploit is a textbook case of XML External Entity (XXE) attacks. It underscores the imperative need for robust security measures, specifically around XML parser configurations, in real-world applications.
The implications are clear: an improperly secured XML parser isn't just a minor oversight but a glaring security gap that could lead to significant data breaches. Therefore, hardening XML parsers against XXE vulnerabilities isn't just advisable—it's essential.
The code under discussion is serving as a robust mechanism for thwarting XML External Entity (XXE) attacks in Java-based applications. XXE attacks can wreak havoc by introducing harmful XML entities into otherwise benign XML data. Let's delve into the code's protective layers to understand its efficacy.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
This line kicks off the remediation process by creating a new instance of DocumentBuilderFactory. This factory object will later be customized to give birth to DocumentBuilder instances, which are in charge of parsing XML documents.
The above line makes a pivotal security adjustment to DocumentBuilderFactory. It disallows the use of Document Type Definitions (DTDs) within XML documents. This is crucial because DTDs can be weaponized to define hazardous entities, leading to XXE attacks.
Here, the code further enhances security by disabling the parsing of external general and parameter entities. These entities can point to external files and resources, serving as vectors for XXE attacks.
In essence, the code carefully configures DocumentBuilderFactory to fortify its XML parsing capabilities against XXE vulnerabilities. By disallowing DTDs and deactivating external entity parsing, the code significantly reduces the attack surface.
This remediation tactic is not just a good practice but a cybersecurity imperative, given the evolving landscape of threats. The code encapsulates a proactive approach to security, underlining the importance of nipping vulnerabilities in the bud.
After applying the remediation code, the terminal output clearly shows that the exploit is no longer functional. Unlike before, there are no attempts to access or write to sensitive files, affirming that the vulnerability has been successfully mitigated.
The absence of any significant output or messages in the terminal, post-execution of the binary with remediated code, serves as a silent testimony to the effectiveness of the security measures. Had the exploit been viable, one would expect to see evidence of unauthorized file access or other malicious activity.
There are several approaches to mitigate XML External Entity (XXE) vulnerabilities besides the code example I provided earlier in number 1 and number 2 which are the summaries of the process we have done together. Here are some common strategies:
Each of these strategies has its own merits and can be combined for comprehensive XML security.
After applying our remediation code, we scanned our Java application once again using Semgrep's OWASP Top Ten rule set. Intriguingly, even though the code was remediated, the scan still flagged it as vulnerable under the same CWE-611 label. Before remediation, the vulnerability showed up twice, each with a different check_id:
Post-remediation, only the first check_id continued to appear in the scan results. This lingering flag suggests that this particular rule might be outdated and in need of an update.
For a closer look at the rule set and its potential limitations, you can visit the Semgrep OWASP Top Ten Rule Set.
The persistent flagging by the first check_id, despite successful remediation, serves as a reminder: automated security tools are a vital component but not a complete solution. They should be integrated into a more comprehensive, layered security strategy.
While remediation for the check_id starts with ”Java.lang.security.audit” is remediated with the functions mentioned early in the this blog, vulnerability alarm still exists with the check_id starts with “contrib.owasp.java” which potentially result in False Positives and potential times loses for security and developer teams in the company.
Here we can see the different findings have the same CWE number “CWE-611” yet remediation for one is not enough for the other. Therefore, we inspected the unremediated rule and added functions in accordance with.
Here we include the new codes of remediation which are considered as missing.
We can see the results of the Semgrep scan with the final including functions, the vulnerabilities or False Positive alerts have disappeared.
Here we look at the remediation functions one by one so that we can understand the functionalities and the processes that can affect the codes mentioned. In the remediation section, two types of Java functions are highlighted: one is crucial for the detection of both rule sets, while the other is specific to the OWASP Top Ten ruleset configuration. We will explore these functions as configured in two distinct YAML files, focusing on the differences. Here the first function appears in both rule sets, whereas the other is exclusive to the OWASP configuration.
With setXIncludeAware(true), XML documents can embed other XML documents. If not managed correctly, this can be a potential attack vector. On the other hand, setNamespaceAware(true) equips the parser to recognize XML namespaces, crucial for maintaining XML document structure and validation.
In essence, while XML brings versatile data structuring capabilities, securing parsers against XXE vulnerabilities is of paramount importance to protect sensitive data and system integrity.
Wrapping up: XXE attacks are a serious threat that require proper understanding and mitigation. However, we should also be careful not to be misled by false positives, which can distract us from the real vulnerabilities. We need to try to keep a balanced focus and give XXE vulnerabilities the attention they need.