Entries in Cloud (5)


Docker Secure Deployment Guidelines

Within today’s growing cloud-based IT market, there is a strong demand for virtualisation technologies. Unfortunately most virtualisation solutions are not flexible enough to meet developer requirements and the overhead implied by the use of full virtualisation solutions becomes a burden on the scalability of the infrastructure.

Docker reduces that overhead by allowing developers and system administrators to seamlessly deploy containers for applications and services required for business operations. However, because Docker leverages the same kernel as the host system to reduce the need for resources, containers can be exposed to significant security risks if not adequately configured. 

The GitHub repository referenced below aims at providing some deployment guidelines for Docker developers and system administrators alike, that can be used to improve the security posture of Linux containers within a Dockerized environment.


Avoiding Residual SSH Keys on Ubuntu AMIs

Note: This item has been cross-posted from the SendSafely blog. You can find the original post at  

If you’ve ever used Amazon EC2 to run Linux, you probably know that the AWS console prompts you to choose an SSH key-pair when spawning a new Linux instance.  Public/private key pairs allow you to securely connect to your instance using SSH after it launches. On Ubuntu Linux, the SSH public key is made available to the instance by the Ubuntu CloudInit package.  This package is installed on all Ubuntu Cloud Images and also in the official Ubuntu images available on EC2 (  It runs at boot time, and adds the SSH key to the default user’s ~/.ssh/authorized_keys file.  


What you may not realize is that by default, CloudInit does not replace the authorized_keys file.  Instead, it appends the key to the existing authorized_keys file.  Depending on your build process, this could create a security exposure if you are not careful, since it can lead to residual keys building up on the image over time as AMIs are created.  This is especially true when the lines between development and operations can be blurred (often referred to as DevOps), a huge trend that will likely continue with regards to applications that run in the cloud.  

Consider this example:  A team wants to launch their application in AWS.  With cool features like Elastic Load Balancers (ELBs) and Auto Scaling, it’s a perfect platform to get scalability without breaking the bank.  They’ll typically start by tasking someone from the team to get a clean base server image (AMI) of the operating system from a trusted source.  They launch an EC2 instance from the AMI and customize the local stack to meet the specific needs of the application (install non-standard packages, get the web server configured, etc).  Once everything is installed and patched, they load the application onto the instance and perform some final configuration tweaks to get things up and running.  Everything appears to be working as intended, so the initial plan to launch in AWS is green lighted.  

The next logical step is for the team to create an image of the new custom build to avoid having to re-install everything in the event that the instance gets hosed or otherwise corrupted.  A new AMI from the running system is created, which will serve as their new “base” AMI.  Going forward, their release/deployment process will start by launching a new instance of the base AMI, applying any recent system patches (hopefully), and deploying the latest version of the code.  Provided the application passes pre-production testing, the instance is ready to be pushed into production.  

When using AutoScaling, ELBs launch and terminate EC2 instances as needed to meet changing load demands over time.  Since each new launched instance needs to originate from an AMI, a new AMI that has the exact copy of the code you want running in production will need to be generated.  The newly configured deployment instance will usually serve as the source for the AMI, so a final AMI gets created for the ELB to use.  

In many environments, development teams are not given unrestricted access to production systems.  If a team wants to get an AMI spawned in production, they would likely need to request this be done by a separate production support team.  In all likelihood, the production team launch their AMIs with a separate SSH key that the development team rightfully does not have access to.  

What could go wrong here?  Every time you create an AMI from a running EC2 instance, the root SSH key that was used to spawn the instance gets copied into the AMI.  Looking at the history of the final production AMI used in this example:  

  • It was launched from a clean install image (no pre-loaded keys) and was configured with a new ssh key by the EC2 wizard.  
  • That key was then copied to the “base” AMI, which was then launched during the pre-deployment setup.  If a different SSH key was used during this launch, there are now twoSSH keys on the new instance.  
  • Both of those keys get copied when the “deployment” AMI is created, which might then launched by the production team using yet another SSH key. Unless someone thought to clean out the un-wanted keys before creating that final AMI, all three SSH keys end up on the running host in production.  

To avoid this problem, you’ll want to make sure that production engineers (not your developers) are tasked with creating AMIs that will ultimately be used on production ELBs.  They should specifically remove all un-wanted SSH keys from the authorized_keys file before creating the AMI.  

The process of checking for un-wanted SSH keys should already be baked into most server deployment processes, but this step can easily get overlooked when AMIs are used for frequent deployments.  


Ekoparty Presentation: Cloud & Control

I gave my first presentation at a security conference on Friday, presenting at ekoparty on some work I did at the beginning of the year on distributing complex tasks to hundreds or thousands of computers. [email protected] was the project that pioneered the idea of distributed volunteer computing, and their command & control software evolved into a generic project called BOINC. You can run just about any application in BOINC - whether it’s open or closed source, uses GPUs, the network, or even if it’s not CPU intensive (like nmapping the internet).

Setting up a server isn’t the most exciting topic to talk about, so I used two examples to illustrate BOINC in my presentation: factoring RSA512 to recover the private key to SSL certificates or PGP keys and cracking passwords. Factoring was a huge success, but cracking didn’t work out that well. BOINC was able to distribute the work and crack things really quickly - by splitting up wordlists automatically based on hash functions I was able to scale out to more machines than I think most people are able to… but the problem came from never actually looking at the output. The best crackers, especially in cracking contests, find patterns in the cracked passwords to make mangling rules and masks and crack more passwords. You could still use BOINC as a work distributor to scale out, but you need to be behind the wheel making work units - not use it as a fire-and-forget system.

Getting applications running in BOINC is a bit of trial and error. If it’s an open source application, you have to patch it a little bit and if it’s closed source you have to write a job.xml file defining how to run the application. In either case you have to define input and output templates that let BOINC know what files to send with the workunit and to expect the program to produce. And when I was sending a couple hundred MB wordlists and resource files, I wanted to compress them and decompress them on the client, so that added a little bit of work too. To try and make it easier on you, I’ve released all the scripts, templates, config files, and patches I created while working with BOINC. I’ve also not just released my slides, but annotated them with links to the reference material for everything mentioned. Everything is up on github.

I’ve wanted to factor large numbers for a while, and this was actually what got me into this whole mess. I have some (simple) observations about factoring using the General Number Field Sieve, as well as instructions for how to do it yourself (with or without BOINC).

I have to thank Leonardo and all the ekoparty organizers for putting on a great conference. They went out of the way to make the international arrivees as comfortable as possible, and even had simultaneous translation from english to spanish and from spanish to english. Buenos Aires is a wonderful city, and I really recommend you visit!


Abusing WCF to Perform Remote Port Scans

Last weekend at Shmoocon, I demonstrated how an attacker can trick certain WCF web services into performing an unauthorized port scan of machines behind a firewall.  For those that were not able to attend the talk, the slides are posted here. The part that covers the port scanning technique may not be clear in isolation, so I’ll try and explain it in detail. The problem is related to the WSDualHttpBinding, so in order to understand how the scanning technique works you must first understand some WSDualHttpBinding basics. 

The WSDualHttpBinding

The WSDualHttpBinding is one of several “Duplex” WCF bindings.  The term Duplex refers to the bi-directional nature of the communication channel, meaning that both the client and the service can directly send messages to each other.   This is ideal for scenarios where a service needs to “push” data down to a client, rather than the alternative of constantly polling the server for a callback.   In order to do this over HTTP, which is by nature a one-way protocol, WCF sets up a dedicated HTTP listener port on the client that accepts incoming HTTP requests from the service (known as the callback channel).   If you are like me, you probably just raised an eyebrow when I said that WCF sets up an inbound HTTP listener on the client machine.  This scenario sounds odd from a security perspective, which is what initially caught my eye.

The first step in establishing a session with WSDualHttpBinding requires the client and server to negotiate the duplex connection.  This negotiation is a required part of the connection sequence, and is the mechanism that can be abused to perform remote port scanning.  The negotiation starts with the client sending a “CreateSequence” SOAP request to the web service endpoint.  A typical CreateSequence request is shown below.

As you can see, the CreateSequence request includes a “ReplyTo” address.  This address is the URL of the callback channel at which the client expects to receive callback requests from the service.  When the service receives this request, it reacts by initiating a “CreateSequenceResponse” to the ReplyTo address, and then responding to the original request with a “202 Accepted”.  Conceptually this is represented by the diagram below.  Note that the circled numbers represent the order in which each request and response occurs. 

The scenario above represents the intended chain of events for a CreateSequence negotiation.  There are a few important things to note:

  • There are two separate HTTP conversations occurring.  One is between the client and the service over port 80, and the other is between the service and the client on port 8000.
  • When the service receives a CreateSequence request, it will immediately attempt to issue the CreateSequenceResponse request to the address that is passed within the ReplyTo value.  This does NOT have to be the same address (or port) where the CreateSequence request originated from. 

Next, let’s introduce another slightly more complex example.  In this scenario, we have 4 machines:

  • The client, which in this case will end up being the bad guy
  • The WCF service that uses WSDualHttpBinding
  • Two unrelated hosts that will serve as targets

The client in this case will send two CreateSequence requests to the service.  The first request will include a ReplyTo address of Target1, and the second request will include a ReplyTo address of Target2.  Again, the circled numbers represent the order in which each request and response occurs. 

This diagram is much more interesting as it depicts what is certainly NOT an intended use case.  As illustrated above, the first CreateSequence request (1) causes the service to initiate a connection to Target1 on port 8000, just as the second CreateSequence request  (4) does to Target2.   Even more interesting is that the “Accepted” HTTP response (7) to the second CreateSequence request (4) does not occur until AFTER the connection to Target1 times out (5).  This means that the delay between the second CreateSequence (4) and the subsequent “Accepted” response (7) was directly related to the response time of the first CreateSequenceResponse attempt (5). It appears that a WCF service will not respond to a new CreateService request until all previous CreateSequenceResponse requests have either been acknowledged or timed out. 

What Does this Mean?

Based on the behavior described above, the CreateSequence HTTP response delay is an effective mechanism to determine the state of a prior connection request.  By issuing multiple requests to different hosts and ports, we can use this behavior to probe remote hosts from the server hosting the WCF service.  Depending on the connectivity available from the host, we can even probe systems that would not otherwise be available to us (such as on an internal network or DMZ). 

In order to prove this theory, I wrote a utility to issue successive CreateSequence requests to a WCF service that each have a different ReplyTo address and/or port.  It measures the time between a CreateSequence request and the “202 Accepted” response in an attempt to determine whether a previous request was successful.  The utility is fairly simple and operates as follows (assume that Service is the WCF service we want to mis-use, and that the Target is the machine we want to port scan): 

  • Request #1:  Issue a CreateSequence request to Service which will ReplyTo Target on Port 1.  The delay (if any) on this first request is not associated with a connection we initiated so the timing of this first response is ignored. 
  • Request #2: Issue another CreateSequence request to Service which will ReplyTo Target on Port2.  A timer is used to measure the time between this request the “202 Accepted” response from Service.   This response will not occur until the previous CreateSequenceResponse has been acknowledged or timed out.  As such, this delay will be used to infer the outcome of the probe caused by Request #1.
  • Request #3:  Issue a CreateSequence request to Service which will ReplyTo Target on Port3.  A timer is used to measure the time between this request the “202 Accepted” response from Service.   This response will not occur until the previous CreateSequenceResponse has been acknowledged or timed out.  As such, this delay will be used to infer the outcome of the probe caused by Request #2.
  • and on and on and on…

Proof of Concept

As a proof of concept, I deployed an instance of the MSDN CalculatorDuplex sample service to a virtual machine in the Microsoft Azure cloud to use as a test case.  This service is a simple calculator web service that uses the WCF WSDualHttpBinding. As it turns out, the Azure environment was a great place to test this concept since Azure VMs actually reside on an internal private 10.x.x.x network behind a firewall.  Conceptually, this is represented in the diagram below (note, this is an over simplified diagram based on what I have seen in my limited testing with Azure).

Based on an analysis of the VM running the sample service, it also appeared that the VMs within the Azure environment typically run IIS on port 20000.  I used the utility to remotely scan other VMs within the 10.x.x.x address space on this port through requests to the Calculator service.  The results from the initial test are shown in the screenshot below.

As you can see, the result of each probe is inferred based on the average response time of the other requests.  The scan above shows that four of the probes returned very quickly (around 114 ms) while the others appear to have timed out.  The probes that do not time out in this case are the other internal VMs that are up and running IIS on port 20000.  As a second test, I used the utility to probe ports on the localhost of the machine running the Calculator service.  As you can see below, the probe to port 3389 times out while the others return after about 1 second.  So in this case, the Remote Desktop service is running on the localhost.

So to summarize, this appears to be a potential design flaw within the WCF create sequence negotiation process.  As a result, any service that uses this binding can be abused by a remote user to scan other hosts (even those behind a firewall that they may not otherwise have access to).  Certain web-based attacks can also be proxied through these services since the remote attacker has the ability to control not only the target address and port, but also the complete URI that will be requested. The source code for the scanner utility is posted here for reference.


Breaking Password Based Encryption with Azure

During a recent security review, we came across a .NET application that was encrypting query string data to thwart parameter based attacks. We had not been given access to the source code, but concluded this since each .aspx page was being passed a single Base64 encoded parameter which, when decoded, produced binary data with varying 16 byte blocks (likely AES considering it is the algorithm of choice for many .NET developers).

The Code

After doing some research (aka plugging the words “.NET”, “Query String” and “Encryption” into Google), we identified several references to a piece of code that had been written and published a few years back for encrypting query strings in .NET. The code we found even used the same parameter name as our application did to pass the encrypted query string data to each page, so we were fairly confident it was the code they were using.

Having written SPF, I am always interested to see how other applications implement cryptography since I know it is not always easy to do properly. In addition to the common problem of re-using the same IV for every encrypted query string, we noticed that the key was entirely derived from a static password embedded in the code (it was being derived using the .NET Framework PasswordDeriveBytes class directly from the literal string value “key”).

For reference, I’ve included the Decrypt method below:

  private const string ENCRYPTION_KEY = "key";

public static string Decrypt(string inputText)
RijndaelManaged rijndaelCipher = new RijndaelManaged();
byte[] encryptedData = Convert.FromBase64String(inputText);
byte[] salt = Encoding.ASCII.GetBytes(ENCRYPTION_KEY.Length.ToString());
PasswordDeriveBytes secretKey = new PasswordDeriveBytes(ENCRYPTION_KEY, salt);

using (ICryptoTransform decryptor = rijndaelCipher.CreateDecryptor(secretKey.GetBytes(32),
using (MemoryStream memoryStream = new MemoryStream(encryptedData))
using (CryptoStream cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read))
byte[] plainText = new byte[encryptedData.Length];
int decryptedCount = cryptoStream.Read(plainText, 0, plainText.Length);
return Encoding.Unicode.GetString(plainText, 0, decryptedCount);

Password based encryption schemes like this are common in many applications, since the key can easily be represented by a word or passphrase.  The nice thing from an attacker’s perspective is that regardless of how large the real encryption key is, the feasibility of a brute force attack is largely dependent on the length and complexity of the password used to derive the key and not the key itself.  So for this example, even though they are using 256-Bit AES encryption (generally considered secure), the password used to generate the key is easily brute forced since it is only 3 characters.

Given the code we found, the first and obvious test was to try decrypting our query string values with the same “key” string.  Sadly that didn’t work.  After trying several educated guesses at what we thought could be the password, I decided to clone the decryption logic into a .NET console utility and run a recursive alphanumeric brute force against the password. The approach was rather simple:

  • Take one of our encrypted samples
  • Loop through every alphanumeric character combination
  • Using the identical logic shown above, derive the key and decrypt

The caveat here is that we really don’t know what value to expect when it decrypts, but chances are it should be just ASCII text (and hopefully a query string name/value pair).  The good news is that most of the keys we generate will generate a CryptographicException, so we can rule out any key value that results in this exception.   For safety’s sake I decided to convert the results of every successful decrypt to ASCII and save for further review if needed.  

The Cloud

After running the utility for an hour or so I realized that a laptop Windows instance was not the optimal environment for running a brute force password crack (not to mention it rendered the machine pretty useless in the meantime).  Having recently signed up for a test account on the Microsoft Azure cloud platform for some unrelated WCF testing, I thought this would be a great opportunity to test out the power of the Microsoft cloud.  Even better, Azure is FREE to use until February 1, 2010.

The concept of using the cloud to crack passwords is not new.  Last year, David Campbell wrote about how to use Amazon EC2 to crack a PGP passphrase.  Having never really worked with the Azure platform (aside from registering for a test account), I first needed to figure out the best way to perform this task in the environment. Windows Azure has two main components, which both run on the Azure Fabric Controller (the hosting environment of Windows Azure):

  • Compute - Provides the computation environment.  Supports “Web Roles” (essentially web services and web applications) and “Worker Roles” (services that run in the background)
  • Storage - Provides scalable storage (Blobs, Tables, Queue)

I decided to create and deploy a “Worker Role” to run the password cracking logic, and then log all output to a table in the storage layer.  I’ll spare you the boring details of how to port a console utility to a Worker Role, but it’s fairly simple.  The first run of the Worker Role was able to produce approximately 1,000,000 decryption attempts every 30 minutes, or about 555 tries/second.  This was definitely faster than the speed I was getting on the laptop, but not exactly what I was hoping for from “the cloud”. 

I did some research on how the Fabric Controller allocates resources to each application, and as it turns out there are 4 VM sizes available as shown below:

Compute Instance Size CPU Memory Instance Storage I/O Performance
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

The size of the VM used by the Worker Role is controlled through the role properties that get defined when the role is configured in Visual Studio.  By default, roles are set to use the “small” VM, but this is easily changed to another size.  The task at hand is all about CPU, so I increased the VM to “Extra Large” and redeployed the worker role. 

Expecting significant performance gains, I was disappointed to see that the newly deployed role was running at the same exact speed as before.  The code was clearly not taking full advantage of all 8 cores, so a little more research led me to the Microsoft Task Parallel Library (TPL).  TPL is part of the Parallel Extensions, a managed concurrency library developed by Microsoft for .NET that was specifically designed to make running parallel processes in a multi-core environment easy.  Parallel Extensions are included by default as part of the .NET 4.0 Framework release.  Unfortunately Azure does not currently support .NET 4.0, but luckily TPL is supported on .NET 3.5 through the Reactive Extensions for .NET (Rx).

Once you install Rx, you can reference the System.Threading.Tasks namespace which includes the Parallel class.  Of specific interest for our purpose is the Parallel.For method.  Essentially, this method executes a for loop in which iterations may run in parallel.  Best of all, the job of spawning and terminating threads, as well as scaling the number of threads according to the number of available processors, is done automatically by the library.

As expected, this was the secret sauce I had been missing.  Once re-implemented with a Parallel.For loop, the speed increased significantly to 7,500,000 decryption attempts every 30 minutes, or around 4,200 tries/second.   That’s 1M tries every 4 minutes, meaning we can crack a 5 character alphanumeric (lowercase) password in about 4 hours, or the same 6 character equivalent in about 6 days.   This is still significantly slower than the speed obtained by Campbell’s experiment, but then again he was using a distributed program designed specifically for fast password cracking (as opposed to the proof of concept code we are using here), not to mention I am also logging output to a database in the storage layer.  At the time of writing, the password hasn’t cracked but the worker process has only been running for about 24 hours (so there’s still plenty of time).  What remains to be seen is how fast this same code would run in the Amazon EC2 cloud, which may be a comparison worth doing.

The important takeaway here is not about the power of the cloud (since there’s nothing we can do to stop it), but rather about Password Based Encryption.  Regardless of key length and choice of algorithm, the strength of your encryption always boils down to the weakest link…which in this case, is the choice of password.