Dockerfile Security Best Practices with Semgrep

Cenk Kalpakoğlu25 Aug 2022

The world of software development moves fast, and it's constantly evolving. Containerization technologies, especially Docker, are among today's most preferred virtualization technologies.

Although Docker containers are "sufficiently" secure by default, configuration errors in a Dockerfile might lead to critical security risks or degraded system performance.

While many companies migrate their business logic to docker containers with the microservice transformation, the use of different technologies, especially in large companies, is a factor that makes these configuration errors very likely.

In this blog post, we will discuss how we can customize and use Dockerfile best practices for your organization using the open-source tool called Semgrep.

What is Semgrep?

Semgrep (semantic grep) is a free open-source (LGPL-2.1) static code analysis tool developed by Return To Corporation.

Its ease of use, speed, and customization ability make it preferable for many organizations to detect custom findings while creating their AppSec programs. Semgrep allows users to run custom rules in yaml format for identifying vulnerabilities so that experts can run a contextual scan on their codebase. You can get detailed information about Semgrep from this link.

Our main goal is to follow the “Dockerfile best practices” to have a secure-by-default approach.

The first step is to decide which best practices you want to follow, and which are needed for you. "Best practices" are good, but you may not need to implement them all. That's why we need to do a lightweight threat modeling for identifying the patterns we need to follow.

The practices that we will follow for this blog are as follows:

  • Enforcing a “custom” distroless image.
  • Using rootless containers.
  • App user control (last user must be the “app” user).
  • Check health check instructions.
  • Using a multistage build.

What does the secure-by-default strategy mean?

Secure by default means configuring a system's default settings in the most secure way possible. By  following the “secure by default” approach, it is possible to prevent problems in the future at an earlier stage.

A security engineer should always try to build the secure-by-default approach into the organization they work for.

Distroless base image:

Although Docker is a “virtualization” technology, it does not provide a real virtual machine. However, it includes a virtual Linux host system architecture. This means that the docker image may contain some unnecessary components for our app to run. To keep the attack surface limited and not to have useless components, we want our apps to use a “distroless base image”.

This way, it is possible to reduce the size of our image and make the attacks much more limited.

"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells, or any other programs you would expect to find in a standard Linux distribution. [link]

We will use the following Semgrep rule to enforce developers to use the distroless “gcr.io/distroless/static-debian10” image created by Google in their Dockerfiles.

rules:
- id: use-distroless-base-image
  languages:
    - dockerfile
  message: >-
    Distroless base image not found. Please use `gcr.io/distroless/static-debian10` as a base image.
  severity: ERROR
  metadata:
    category: security
    technology:
      - dockerfile
  patterns:
    - pattern-regex: FR\w+\s[a-zA-Z0-9]\w+\:+\w+
    - pattern-not: FROM gcr.io/distroless/static-debian10
    - pattern-not-regex: \s*\#.*
    - pattern-not-inside: FROM $IMAGE:$TAG as builder
  paths:
    exclude:
      - "./vendor/*"

You can also use this approach to enforce a "custom base image" or tag a specific version of that image which wil help you ensure that your developers are using base images that you allowed, ensure its safety with security tests, and pinned the version when creating docker containers.

Missing effective user:

The most common mistake when building a container is not specifying the effective user on which the application will run. Applications running in Docker work with ROOT permissions by default. Since our goal is to keep the attack surface restricted (least privilege), we must ensure that our application works with APP USER permissions.

Since the secure-by-default approach aims to provide security with the default settings, you need to ensure that our application is running with limited user permissions.

rules:
  - id: missing-user
    languages:
      - dockerfile
    message:
      By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker
      can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile
      is a USER other than 'root'.
    severity: ERROR
    metadata:
      category: security
      technology:
        - dockerfile
      confidence: MEDIUM
    patterns:
      - pattern-either:
          - pattern: CMD ...
          - pattern: ENTRYPOINT ...
      - pattern-not-inside: |
          USER $USER
          ...

Run non-root user:

Sometimes developers tend to set effective USER as ROOT. Except when we need it, we rarely need to run our APP with the ROOT user. This approach is the most straightforward security measure you can take in containerized architectures.

rules:
  - id: last-user-is-root
    languages:
      - dockerfile
    message: >-
      The last user in the container is 'root'. This is a security
      hazard because if an attacker gains control of the container
      they will have root access. Switch back to another user after
      running commands as 'root'.
    severity: ERROR
    metadata:
      source-rule-url: https://github.com/hadolint/hadolint/wiki/DL3002
      references:
        - https://github.com/hadolint/hadolint/wiki/DL3002
      category: security
      technology:
        - dockerfile
      confidence: MEDIUM
    patterns:
      - pattern: USER root
      - pattern-not-inside: |
          USER root
          ...
          USER $USER
    paths:
      exclude:
        - "./vendor/*"

Multistage build:

Since we are trying to reduce the attack surface with the secure by default approach, it will be necessary to reduce the size and dependencies of the image, both in terms of security and performance.

Separating our app's compile stage and runtime allows us to use more minimal – even distroless images.

The dependencies required for the build and runtime dependencies of an application are different and we need to ensure that we only have runtime dependencies in our Docker image.

rules:
  - id: multistage-build
    languages:
      - dockerfile
    message: >-
      Missing multistage builds.
    severity: INFO
    metadata:
      category: best-practice
      technology:
        - dockerfile
    patterns:
      - pattern: |
          FROM $STAGE AS builder
          ...
      - pattern-not-inside: |
          FROM $STAGE AS builder
          ...
          FROM $IMAGE
          ...
    paths:
      exclude:
        - "./vendor/*"

Healthcheck instruction control:

Semgrep is a great tool to find some anti-patterns in our codebase and from a secure by default perspective, it’s always a good practice to include some other controls in our checklist.

If your platform team requires enabling the health check instruction on the dockerfiles we can have this control as well.

DevSecOps is all about culture and the tools are just a medium.

rules:
  - id: missing-healthcheck
    languages:
      - dockerfile
    message: >-
      Missing HEALTHCHECK instruction.
    severity: INFO
    metadata:
      category: best-practice
      technology:
        - dockerfile
    patterns:
      - pattern: |
         FROM gcr.io/distroless/static-debian10
         ...
      - pattern-not-inside: |
         FROM gcr.io/distroless/static-debian10
         ...
         HEALTHCHECK $F 
         ...
    paths:
      exclude:
        - "./vendor/*"

Security incidents usually do not arise from a single major mistake, but consecutive mistakes. With the measures we can take, it is possible to break or reduce this chain of errors or reduce the impact of security incidents.

By using secure by default strategies, we can write customized policies for our organization and put them into use in a short time.

You can find all the examples at github.

Get A Demo