Introduction
When we talk about secret sprawl, we immediately think about API keys, passwords, credentials or any secret lurking in some source code.
For sure, source code is very tightly linked to secret sprawl. Unfortunately, this is not the only origin of sensitive information leaks.
Security teams looking to secure the entire perimeter of an application need to consider all possible sources of leaks. One of these sources is Docker images.
Docker Images
What is a Docker image? A Docker image is a read-only template (a file) containing instructions for creating a container that can run on the Docker platform. It describes a filesystem that contains everything required to run the application: dependencies, source code, binaries, environment variables and some metadata.
Docker images are built as a stack of modifications (just like some VCS do), and, from an image, it is possible to retrieve each of the previous steps and the modifications applied. SRE teams mainly use Docker for portability and easy software deployment.
Where Are the Secrets Hiding?
First of all, Docker images embed source code, and as with any source code, it is likely to contain secrets. Sure, source code can be protected using tools like GitGuardian. Still, since the code published in the image may be altered later by the publisher or the publishing process, Docker images can go round the security checks.
An example is a developer building an image from his local project with unpublished changes (such as files in the .gitignore) who will then publish the image.
Then you have the configuration of the Dockerfile. Secrets can be added through the Dockerfile, either directly or by adding a file containing secrets. It is very common to require some sort of credentials to build or run an application: to access a package manager, to connect to other services…since Docker is mainly used to be run on any machine, it could sound okay at first to include the secrets as well.
Finally, the layered structure of a Docker image is very prone to leaks. A layer can hide the secrets from the previous one so that it is not visible in the final state while still in the image. Moreover, unlike source code, no one digs into Docker image layers to review it.
For all those reasons, we decided to test and implement a dedicated secret scanner to find secrets in Docker images: ggshield scan docker
Why You Should Care
Just as source code, Docker images can be published in shared repositories, publicly on hub.docker.com, or in a company registry. All of these places represent a potential threat.
Take, for example, this year’s Codecov breach. The application Docker image contained Git credentials that allowed an attacker to gain access to Codecov’s private Git repositories and slip a backdoor in their product, which would later affect a considerable number of Codecov’s 22,000 users.
Methodology to Scan at Scale
As previously explained, secrets can be embedded in images in several places and at different stages of the build.
When building a Docker image from scratch, most of the layers consist of the installation of tools such as Debian or language-specific packages. These are not the layers containing secrets.
The layers that can contain secrets are the ones where files are manually added or copied, or environment variables modified. Fortunately, Docker images contain a manifest file that describes all the different operations performed to build the image. This manifest is used to filter the layers that are related to custom commands from the user for scanning. We then extract files and environment variables from these layers and pipe them into our scanner.
Scanning Docker Hub
After scrapping the Docker Hub API, we found that 7% of the images contained at least one secret. Our analysis was performed on a sample of 2,000 public images recently pushed to Docker Hub.
The “Other” category contains all generic credentials: secrets that don’t provide information on the secret provider, like high entropy strings (although we can never be sure these are “real” secrets, the algorithm can infer the probability based on the surrounding context). Since the end of 2017, we have had a strong focus on detecting specific secrets used by developers in source code,. However, it seems that secrets embedded in Docker images are different and maybe more related to internal services than what we are used to.
First, we notice that secrets types are far fewer in Docker images: a straightforward explanation is that there are far fewer public images (~8.5 million) on Docker Hub than public repositories on GitHub (several hundred million). Yet, our continuous monitoring of the latter showed that the more volume, the more diversity.
Second, private keys’ presence is very limited in source code (2.8%) compared to images (23.1%). It is not a surprise either, considering private keys are more often used for container system communication and authentication.
Conclusion
Docker images, because of their structure and usage, are likely to contain hidden secrets. We found that 7% of public images have secrets. Therefore, you should take into account this attack surface, which is now actively exploited by hackers, as demonstrated earlier this year by the Codecov incident. While good security hygiene is undoubtedly needed (we have produced a cheat sheet on containers’ security best practices), automatic scanning has become a must to harden your supply chain. This is why implementing a CI step scanning for secrets (ggshield, SecretScanner) is needed as much as scanning for vulnerabilities (Clair, Trivy, Docker Bench for Security).