From time to time, we need to combine or compose mutliple tools into a single docker image.
Instead of running a tedious
yum install and waiting for your image to build,
try just combining the upstream docker images directly into one.
Especially for docker images containing only utilities, or ones where there is a main purpose and we need to add a couple of utilities, it would be handy if we could just inject the needed binary from an upstream image.
In this tour, we build a combination of upstream docker image for python with a couple of utilities:
to demonstate how to combine multiple docker images into one.
Such image can be used, for example in a CI/CD build pipeline, providing both the build environment
for our Python application and the
terraform command to run deployment steps.
Everybody reading this has probably some specific use-case in mind, so let me know in the comment, what is your use-case.
Let us start with base Python image:
FROM python:3.9.16-bullseye CMD [ "/bin/bash" ]
We add a terraform image in the same Dockerfile, and then, we cut out the
terraform binary from it:
FROM hashicorp/terraform:1.4.0 AS terraform FROM python:3.9.16-bullseye COPY --from=terraform /bin/terraform /bin/terraform CMD [ "/bin/bash" ]
If this is the first time you see multiple
FROM statements in a
Dockerfile, this is a feature called multi-stage build.
You might want to check the documentation after completing this tour.
Update 2023-03-17: After a comment on LinkedIn post, it turns out, that
COPY --from searches for image name
in central registry if there is no previous stage found. Quoting the documentation of COPY:
COPY accepts a flag
--from=<name> that can be used to set the source location to a previous build stage (created with
FROM .. AS <name>) that will be used instead of a build context sent by the user. In case a build stage with a specified name can’t be found an image with the same name is attempted to be used instead.
Dockerfile may be simplified to just:
FROM python:3.9.16-bullseye COPY --from=hashicorp/terraform:1.4.0 /bin/terraform /bin/terraform CMD [ "/bin/bash" ]
docker build -t python-tools .
Will it work?
> docker run --rm -it python-tools /bin/bash root@9cb293fcc406:/# terraform --version Terraform v1.4.0 on linux_amd64 root@9cb293fcc406:/# python --version Python 3.9.16
It works. But why?
Some pieces of software can be just dropped at our CPU and work. In this case, the upstream terraform image is actually built on Alpine operating system.
Nevertheless the GO compiler builts statically linked binaries.
Regardless of GNU
installed on our Debian operating system is not the same as Alpine’s musl
the terraform binary itself does not expect any external compiled code in then runtime.
It was built as all-in-one binary.
However, what must be satisfied is CPU architecture. Architecture is in our case amd64. If the upstream image does not have the same architecture as the image we want to inject binary into, we are out of luck.
This was easy, let us continue with one more challenge.
FROM hashicorp/terraform:1.4.0 AS terraform FROM curlimages/curl:7.87.0 AS curl FROM python:3.9.16-bullseye COPY --from=terraform /bin/terraform /bin/terraform COPY --from=curl /usr/bin/curl /usr/bin/curl CMD [ "/bin/bash" ]
How did we know that
curl binary is not
Well, we didn’t. Most of the time, easiest way to find out is just to try
/usr/bin and see,
where authors of upstream image had put the binary.
It may be even
If we would like to take more scientific approach, we could examine the image with
> docker run --entrypoint '' --rm -it curlimages/curl:7.87.0 /bin/sh / $ which curl /usr/bin/curl
Some images, do not have even the
which command inside.
For example the terraform.
We can take a look on the source Dockerfile, examine build scripts or inspect the docker image.
Usually the binary of interest is
> docker inspect hashicorp/terraform:1.4.0 | grep -i entrypoint "ENTRYPOINT [\"/bin/terraform\"]" > docker inspect hashicorp/terraform:1.4.0 | grep -i cmd "Cmd": [
Or take a look on how the image was built:
> docker image history --no-trunc hashicorp/terraform:1.4.0 IMAGE CREATED sha256:c43 3 days ago /bin/sh -c #(nop) ENTRYPOINT ["/bin/terraform"]
After this where is the binary excurse, let us build and try our image:
docker build -t python-tools .
Will it work?
> docker run --rm -it python-tools /bin/bash root@866400d6b362:/# terraform --version ptTerraform v1.4.0 on linux_amd64 root@866400d6b362:/# python --version Python 3.9.16 root@866400d6b362:/# curl --version bash: /usr/bin/curl: No such file or directory
It does not work. But why?
The error message is not exactly helpful in finding what could actually be the problem. When we examine the upstream curl docker image, we see it is an alpine based image.
We can try to make it work, but spoiler alert: it is not worth doing it.
We can change the
python image to an alpine based:
After build, next error will be:
> docker run --rm -it python-tools /bin/sh / # curl --version Error loading shared library libcurl.so.4: No such file or directory (needed by /usr/bin/curl) ...
We can start copying
lib files to our docker image, but after copying
/usr/lib/libcurl.so we will see there are other linked libraries needed.
Process will become tedious and it is not worth the time. Easier is to install
curl using an OS package manager.
Depending on the tool we would like to use, we may have the luck, that our image is already equipped with its dependencies. If we are on the same architecture (amd64), and same version of an underlying operating system, it may work if we just copy the binary.
When to use this approach
When to use combine approach:
- when we want to insert a tool, which is not the main software for which we are building our docker image
- the tool is self-sufficient binary/script and ideally it is confined in one directory, thus it is easy to cut-out (
- the tool does not have additional dependencies on an operating system or those are already met, e.g. python tools on a python image could work (presuming compatible python version)
In all other cases, it is easier to just
RUN apt update && apt install --no-install-recommeds -y <tool> && apt clean
or similar, based on an underlying OS.
Pros and cons of combined docker images
- when it works, the Dockerfile is easy to write and also easy to understand what is going on (e.g. we just want
terraformbinary to be present)
- no hassle around
apkcaches and whether we clean them or we have larger image because of those
- fast implementation, if we do this once we already know the tool is self-sufficient. Next time we need some docker image to have the tool, the trick is 5 minutes of work
- stable versions and deterministic docker image build. We litarally just take layers from other, deterministically chosen docker image.
There is no non-deterministic
apt updateor fetching
latestversion. The built image is always exactly, binary-wise, the same. This also speeds up, subsequent builds as layers are re-used.
- stable versions. If the image goes to production, we might forget to update our tools/dependencies. Then we are deploying older versions, which may be of a security concern. By using and installing the latest version from upstream repositories, we everytime build an image with latest patched version of tools.
Let me know in the discussion about any other pros and cons you see.
Tools that work
From the experience, generally all GO based tools work (
Scripts, which are just one-file (
Java all-in-one uber JAR files, assuming the image already has compatible Java (