Unveiling Java Library Vulnerabilities

Alperen Örsdemir31 Oct 2023
Supply Chain SecurityAppSec

During a recent customer engagement, we encountered an interesting situation. The customer had raised concerns about a Java XXE (XML External Entity) vulnerability that had left their developers puzzled. Notably, their Static Application Security Testing (SAST) scans consistently identified this as a potential vulnerability. Given the complexity of the issue, our security consultancy team challenged me to dive deep into the issue and provide a clear and comprehensive explanation of this vulnerability to our customer. For a detailed, case-driven exploration of XXE vulnerabilities and their remediation in Java, refer to my specialized GitHub repository here.

This article marks the beginning of our efforts to demystify Java XXE vulnerabilities, with the goal of equipping developers and security teams with the knowledge to effectively address them.

In this insightful journey, we'll be deliberately utilizing the vulnerable Java repository (https://github.com/edu-secmachine/javulna) as our testing ground. This serves as an ideal platform to showcase the exploitation of a critical Java library vulnerability. Our exploration will be two-fold: initially, we'll employ Semgrep for a static analysis scan to identify the vulnerability, followed by a hands-on demonstration via terminal execution of a Java file, and we also will demonstrate remediation on a vulnerable code.

Static Analysis with Semgrep: A Docker-Integrated Approach

Seamlessly integrated into a Docker container, Semgrep allows for streamlined scanning of repositories. The command below initiates the scan, providing us with invaluable insights into potential weaknesses:

$docker run --rm -v SPWD:/src returntocorp/semgrep semgrep --config p/owasp-top-ten --json -o /src/semgrepscan-results.json

In this command, we specify the configuration as p/owasp-top-ten, focusing the scan on the top vulnerabilities commonly encountered.

We can see the XXE vulnerability listed inside the results, with additional information about the vulnerability such as “likelihood factor LOW”.

To deepen our understanding, let's examine the Semgrep scan results captured in the image below:

Here the “likelihood” in Semgrep indicates the probability that a detected issue is a true positive and not a false alarm. The screenshot above reveals detailed information about the detected vulnerabilities, including the XXE vulnerability we're particularly interested in. The likelihood factor is marked as “LOW”, inviting further inquiry.

Exploitation with XXE

The image below offers a comprehensive look at the code, the compiled binary file, and its terminal execution.

As seen in the output displayed on the terminal, the exploit successfully targets the vulnerability by invoking the /etc/passwd file. This action confirms that the code can indeed compromise the system, capturing sensitive data and outputting it directly in the terminal.

Out of Band XXE: Escalating Risks and Internet-Wide Exploits

Another alarming facet of this vulnerability is its potential for Out of Band XML External Entity (OOB XXE) attacks. This form of attack allows the malicious code to send packets across the internet, laying the groundwork for more extensive exploits.

In our next experiment, we modify the initial code to utilize an external XML file. The image below vividly illustrates this adaptation, attaching the payload XML with a common website “webhook”.

By incorporating an external XML file, the exploit gains an added layer of complexity and versatility. This approach further broadens the attack vector, making it even more crucial to address this vulnerability.

Anatomy of the XXE Exploit: Breaking Down the XML Behavior

Let's dissect the XML snippet responsible for the XXE attack and explore how each segment contributes to the compromise. The code snippet, presented below, employs a well-crafted XML structure to trick the XML parser.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<foo>&xxe;</foo>

XML Components and their Roles

  1. XML Declaration: <?xml version="1.0" encoding="UTF-8"?>
    This part specifies the XML version and its character encoding.
  2. Document Type Definition (DTD): <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
    The DTD defines the structure of the XML document and plays a pivotal role in this attack
    • Role of DOCTYPE: The DOCTYPE declaration specifies that the document type is "foo" and sets the stage for defining entities used in the XML document. It sets up a framework for XML validation and for defining entities, such as “xxe” in this case.
    • Entity Definition: The segment [<!ENTITY xxe SYSTEM "<file:///etc/passwd>" >] is the crux of the XXE attack.
    • <!ENTITY xxe: Declares an entity named xxe.
    • SYSTEM "<file:///etc/passwd>": This entity fetches a system file using the <file://> protocol, targeting the /etc/passwd file commonly used in Unix-based systems.
  3. XML Content: <foo>&xxe;</foo>
    This part of the XML houses the root element foo and invokes the malicious xxe entity using &xxe;.

The Exploitation Sequence

  1. The XML parser identifies the &xxe; entity within the element.
  2. It then looks up the xxe definition in the DTD.
  3. The DTD specifies the SYSTEM keyword followed by a file path (file:///etc/passwd).
  4. Finally, the XML parser attempts to read the /etc/passwd file on the host system.

By executing this process, an inadequately secured XML parser will successfully retrieve the /etc/passwd file, leaking sensitive user account information.

With this understanding, it becomes abundantly clear why robust XML parser configurations to make the code secure are not just optional but essential in fortifying applications against XXE vulnerabilities. These configurations are considered as remediations which will be shown in the remediations part.

The Consequences of Lax XML Parser Configuration

If the XML parser isn't rigorously configured to guard against XML External Entity (XXE) attacks, it becomes an unwitting accomplice in a security breach. By executing the malicious XML, the parser can access sensitive system files, such as /etc/passwd. This file is a treasure trove of sensitive information, typically housing user account details. An attacker can exploit this vulnerability to harvest this data.

A Cautionary Tale on XXE Vulnerabilities

In essence, the xmlData string acts as a trojan horse, using a craftily designed Document Type Definition (DTD) to deceive the XML parser into accessing the /etc/passwd file. This exploit is a textbook case of XML External Entity (XXE) attacks. It underscores the imperative need for robust security measures, specifically around XML parser configurations, in real-world applications.

The implications are clear: an improperly secured XML parser isn't just a minor oversight but a glaring security gap that could lead to significant data breaches. Therefore, hardening XML parsers against XXE vulnerabilities isn't just advisable—it's essential.

Remediation: Securing XML Parsing in Java

The code under discussion is serving as a robust mechanism for thwarting XML External Entity (XXE) attacks in Java-based applications. XXE attacks can wreak havoc by introducing harmful XML entities into otherwise benign XML data. Let's delve into the code's protective layers to understand its efficacy.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

This line kicks off the remediation process by creating a new instance of DocumentBuilderFactory. This factory object will later be customized to give birth to DocumentBuilder instances, which are in charge of parsing XML documents.

factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

The above line makes a pivotal security adjustment to DocumentBuilderFactory. It disallows the use of Document Type Definitions (DTDs) within XML documents. This is crucial because DTDs can be weaponized to define hazardous entities, leading to XXE attacks.

factory.setFeature("http://xml.org/sax/features/external-general-entities", false);

factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Here, the code further enhances security by disabling the parsing of external general and parameter entities. These entities can point to external files and resources, serving as vectors for XXE attacks.

Summary: A Proactive Defense

In essence, the code carefully configures DocumentBuilderFactory to fortify its XML parsing capabilities against XXE vulnerabilities. By disallowing DTDs and deactivating external entity parsing, the code significantly reduces the attack surface.

This remediation tactic is not just a good practice but a cybersecurity imperative, given the evolving landscape of threats. The code encapsulates a proactive approach to security, underlining the importance of nipping vulnerabilities in the bud.

After applying the remediation code, the terminal output clearly shows that the exploit is no longer functional. Unlike before, there are no attempts to access or write to sensitive files, affirming that the vulnerability has been successfully mitigated.

The absence of any significant output or messages in the terminal, post-execution of the binary with remediated code, serves as a silent testimony to the effectiveness of the security measures. Had the exploit been viable, one would expect to see evidence of unauthorized file access or other malicious activity.

Bonus: Alternative Strategies

There are several approaches to mitigate XML External Entity (XXE) vulnerabilities besides the code example I provided earlier in number 1 and number 2 which are the summaries of the process we have done together. Here are some common strategies:

  1. Secure XML Parser: Use libraries like OWASP's ESAPI for built-in XXE protection.
  2. Disable DTDs: Configure DocumentBuilderFactory to disable DTD processing, enhancing security.
  3. Whitelist Entities: Allow only pre-defined, safe entities to be processed.
  4. Input Validation: Implement strict schema validation to filter out malicious XML payloads.
  5. Content Security Policies: Limit the sources from which external entities can be fetched.
  6. Use WAF: Deploy a Web Application Firewall to block malicious XML payloads.
  7. Keep Libraries Updated: Use the most recent versions of XML processing libraries.
  8. Regular Audits: Conduct security reviews and penetration testing to identify vulnerabilities.

Each of these strategies has its own merits and can be combined for comprehensive XML security.

Bonus++: Semgrep p/owasp-top-ten doesn’t detect the remediation

After applying our remediation code, we scanned our Java application once again using Semgrep's OWASP Top Ten rule set. Intriguingly, even though the code was remediated, the scan still flagged it as vulnerable under the same CWE-611 label. Before remediation, the vulnerability showed up twice, each with a different check_id:

  1. "contrib.owasp.java.xxe.documentbuilderfactory.owasp.java.xxe.javax.xml.parsers.DocumentBuilderFactory"
  2. "Java.lang.security.audit.xxe.documentbuilderfactory-disallow-doctype-decl-missing.documentbuilderfactory-disallow-doctype-decl-missing"

Post-remediation, only the first check_id continued to appear in the scan results. This lingering flag suggests that this particular rule might be outdated and in need of an update.

For a closer look at the rule set and its potential limitations, you can visit the Semgrep OWASP Top Ten Rule Set.

The persistent flagging by the first check_id, despite successful remediation, serves as a reminder: automated security tools are a vital component but not a complete solution. They should be integrated into a more comprehensive, layered security strategy.

Differences between Remediations/Rules

While remediation for the check_id starts with ”Java.lang.security.audit” is remediated with the functions mentioned early in the this blog, vulnerability alarm still exists with the check_id starts with “contrib.owasp.java” which potentially result in False Positives and potential times loses for security and developer teams in the company.

Here we can see the different findings have the same CWE number “CWE-611” yet remediation for one is not enough for the other. Therefore, we inspected the unremediated rule and added functions in accordance with.

Here we include the new codes of remediation which are considered as missing.

We can see the results of the Semgrep scan with the final including functions, the vulnerabilities or False Positive alerts have disappeared.

Here we look at the remediation functions one by one so that we can understand the functionalities and the processes that can affect the codes mentioned. In the remediation section, two types of Java functions are highlighted: one is crucial for the detection of both rule sets, while the other is specific to the OWASP Top Ten ruleset configuration. We will explore these functions as configured in two distinct YAML files, focusing on the differences. Here the first function appears in both rule sets, whereas the other is exclusive to the OWASP configuration.

  1. Mitigating XXE with External Entity Parsing: XXE vulnerabilities often stem from the XML parser's capability to fetch and evaluate external entities. To prevent this we can use the codes previously mentioned as remediation:
    setFeature("http://xml.org/sax/features/external-general-entities", false);
    setFeature("http://xml.org/sax/features/external-parameter-entities", false);
    These configurations inhibit the parser from interpreting both, general and parameter, external entities. This considerably reduces the possibility of an XXE attack.
  2. Understanding XInclude & Namespace Configurations: While these configurations aren't direct mitigations for XXE, they do influence XML parsing behavior:
    factory.setXIncludeAware(true);
    factory.setNamespaceAware(true);

With setXIncludeAware(true), XML documents can embed other XML documents. If not managed correctly, this can be a potential attack vector. On the other hand, setNamespaceAware(true) equips the parser to recognize XML namespaces, crucial for maintaining XML document structure and validation.

In essence, while XML brings versatile data structuring capabilities, securing parsers against XXE vulnerabilities is of paramount importance to protect sensitive data and system integrity.

Wrapping up: XXE attacks are a serious threat that require proper understanding and mitigation. However, we should also be careful not to be misled by false positives, which can distract us from the real vulnerabilities. We need to try to keep a balanced focus and give XXE vulnerabilities the attention they need.

References used in this Blog Post

1.)https://community.veracode.com/s/article/Java-Remediation-Guidance-for-XXE

2.)https://owasp.org/www-community/vulnerabilities/XML_External_Entity\_(XXE)\_Processing

3.)https://semgrep.dev/p/owasp-top-ten

4.)https://portswigger.net/web-security/xxe

5.)https://github.com/edu-secmachine/javulna

6.)https://mcoskuner.medium.com/

Get A Demo