From time to time, we need to combine or compose mutliple tools into a single docker image.
Instead of running a tedious apt-get
/yum
install and waiting for your image to build,
try just combining the upstream docker images directly into one.
Introduction
Especially for docker images containing only utilities, or ones where there is a main purpose and we need to add a couple of utilities, it would be handy if we could just inject the needed binary from an upstream image.
In this tour, we build a combination of upstream docker image for python with a couple of utilities:
to demonstate how to combine multiple docker images into one.
Such image can be used, for example in a CI/CD build pipeline, providing both the build environment
for our Python application and the terraform
command to run deployment steps.
Everybody reading this has probably some specific use-case in mind, so let me know in the comment, what is your use-case.
Base image
Let us start with base Python image:
FROM python:3.9.16-bullseye
CMD [ "/bin/bash" ]
Add terraform
We add a terraform image in the same Dockerfile, and then, we cut out the terraform
binary from it:
FROM hashicorp/terraform:1.4.0 AS terraform
FROM python:3.9.16-bullseye
COPY --from=terraform /bin/terraform /bin/terraform
CMD [ "/bin/bash" ]
If this is the first time you see multiple FROM
statements in a Dockerfile
, this is a feature called multi-stage build.
You might want to check the documentation after completing this tour.
Update 2023-03-17: After a comment on LinkedIn post, it turns out, that COPY --from
searches for image name
in central registry if there is no previous stage found. Quoting the documentation of COPY:
Optionally COPY
accepts a flag --from=<name>
that can be used to set the source location to a previous build stage (created with FROM .. AS <name>
) that will be used instead of a build context sent by the user. In case a build stage with a specified name can’t be found an image with the same name is attempted to be used instead.
Therefore the Dockerfile
may be simplified to just:
FROM python:3.9.16-bullseye
COPY --from=hashicorp/terraform:1.4.0 /bin/terraform /bin/terraform
CMD [ "/bin/bash" ]
After building:
docker build -t python-tools .
Will it work?
> docker run --rm -it python-tools /bin/bash
root@9cb293fcc406:/# terraform --version
Terraform v1.4.0
on linux_amd64
root@9cb293fcc406:/# python --version
Python 3.9.16
It works. But why?
Some pieces of software can be just dropped at our CPU and work. In this case, the upstream terraform image is actually built on Alpine operating system.
Nevertheless the GO compiler builts statically linked binaries.
Regardless of GNU libc
installed on our Debian operating system is not the same as Alpine’s musl libc
,
the terraform binary itself does not expect any external compiled code in then runtime.
It was built as all-in-one binary.
However, what must be satisfied is CPU architecture. Architecture is in our case amd64. If the upstream image does not have the same architecture as the image we want to inject binary into, we are out of luck.
This was easy, let us continue with one more challenge.
Add curl
FROM hashicorp/terraform:1.4.0 AS terraform
FROM curlimages/curl:7.87.0 AS curl
FROM python:3.9.16-bullseye
COPY --from=terraform /bin/terraform /bin/terraform
COPY --from=curl /usr/bin/curl /usr/bin/curl
CMD [ "/bin/bash" ]
How did we know that curl
binary is not /bin/curl
, but /usr/bin/curl
?
Well, we didn’t. Most of the time, easiest way to find out is just to try /bin
and /usr/bin
and see,
where authors of upstream image had put the binary.
It may be even /usr/local/bin
.
If we would like to take more scientific approach, we could examine the image with which
command:
> docker run --entrypoint '' --rm -it curlimages/curl:7.87.0 /bin/sh
/ $ which curl
/usr/bin/curl
Some images, do not have even the which
command inside.
For example the terraform.
We can take a look on the source Dockerfile, examine build scripts or inspect the docker image.
Usually the binary of interest is ENTRYPOINT
or CMD
:
> docker inspect hashicorp/terraform:1.4.0 | grep -i entrypoint
"ENTRYPOINT [\"/bin/terraform\"]"
> docker inspect hashicorp/terraform:1.4.0 | grep -i cmd
"Cmd": [
Or take a look on how the image was built:
> docker image history --no-trunc hashicorp/terraform:1.4.0
IMAGE CREATED
sha256:c43 3 days ago /bin/sh -c #(nop) ENTRYPOINT ["/bin/terraform"]
After this where is the binary excurse, let us build and try our image:
docker build -t python-tools .
Will it work?
> docker run --rm -it python-tools /bin/bash
root@866400d6b362:/# terraform --version
ptTerraform v1.4.0
on linux_amd64
root@866400d6b362:/# python --version
Python 3.9.16
root@866400d6b362:/# curl --version
bash: /usr/bin/curl: No such file or directory
It does not work. But why?
The error message is not exactly helpful in finding what could actually be the problem. When we examine the upstream curl docker image, we see it is an alpine based image.
We can try to make it work, but spoiler alert: it is not worth doing it.
We can change the python
image to an alpine based:
FROM python:3.9.16-alpine
After build, next error will be:
> docker run --rm -it python-tools /bin/sh
/ # curl --version
Error loading shared library libcurl.so.4: No such file or directory (needed by /usr/bin/curl)
...
We can start copying lib
files to our docker image, but after copying /usr/lib/libcurl.so
we will see there are other linked libraries needed.
Process will become tedious and it is not worth the time. Easier is to install curl
using an OS package manager.
Depending on the tool we would like to use, we may have the luck, that our image is already equipped with its dependencies. If we are on the same architecture (amd64), and same version of an underlying operating system, it may work if we just copy the binary.
Conclusion
When to use this approach
When to use combine approach:
- when we want to insert a tool, which is not the main software for which we are building our docker image
- the tool is self-sufficient binary/script and ideally it is confined in one directory, thus it is easy to cut-out (
COPY --from
) - the tool does not have additional dependencies on an operating system or those are already met, e.g. python tools on a python image could work (presuming compatible python version)
In all other cases, it is easier to just
RUN apt update && apt install --no-install-recommeds -y <tool> && apt clean
or similar, based on an underlying OS.
Pros and cons of combined docker images
Pros:
- when it works, the Dockerfile is easy to write and also easy to understand what is going on (e.g. we just want
terraform
binary to be present) - no hassle around
apt
/yum
/apk
caches and whether we clean them or we have larger image because of those - fast implementation, if we do this once we already know the tool is self-sufficient. Next time we need some docker image to have the tool, the trick is 5 minutes of work
- stable versions and deterministic docker image build. We litarally just take layers from other, deterministically chosen docker image.
There is no non-deterministic
apt update
or fetchinglatest
version. The built image is always exactly, binary-wise, the same. This also speeds up, subsequent builds as layers are re-used.
Cons:
- stable versions. If the image goes to production, we might forget to update our tools/dependencies. Then we are deploying older versions, which may be of a security concern. By using and installing the latest version from upstream repositories, we everytime build an image with latest patched version of tools.
Let me know in the discussion about any other pros and cons you see.
Tools that work
From the experience, generally all GO based tools work (vault
, terraform
).
Scripts, which are just one-file (pass
).
Java all-in-one uber JAR files, assuming the image already has compatible Java (avro-tools
).
Enjoy.