Background
Docker squashing is something that has been around for awhile. For those not in the know, when you create a Docker image it layers each of the commands in the Dockerfile into their own filesystem overlay. The aggregate of these overlays are what you containers starts with.
Now, when you have vary large files getting downloaded into the build process of an image, you can find yourself with a very bloated image. Squashing is generally the solution here.
In my specific case, I was able to build my image and see that it was ~20GB in size. The way I knew I wanted to squash this image was because I could fire up a container of this image and run du -d 0 -h
to see that the entirety of my filesystem was actually ~11GB. This was because the build context had a large zip file that was copied to the file system in one layer and then extracted in another. Bah!
How I Don't Squash
Squashing has been a desirable action for a long time in the docker eco-system. You can see this through the mirad of third party squashing tools that you can have accomplish the tasks with varying levels of success. While I'll be happy to give a third party application a whirl, I want to know I can't do it with Docker proper first.
At one point, docker itself had a --squash
flag integrated into its docker build
process. Although, to access this flag you had to enable the experimental features in the Docker daemon. When I personally got around to doing this, there was some kind of issue with layers not having valid parents. All this is to say that --squash
was not the solution.
Multi-Stage Builds
When looking through github issues about --squash
and the error I encountered, I came across a comment that was something to the effect of: We're going to remove --squash because of Multi-Stage Builds. There was no clarification, just simply a "matter of fact". Huh?
Regardless of what the commenter has intended, this gave me the idea to use Multi-Stage builds for squashing. Fundamentally, isn't an image simply a file system? To test this I re-copied all the files from my big image to a image based on scratch:
FROM big_image as source
COPY --from=source / /
When I did that it certainly did copy the files and squash all of the layers. Unfortunately it also wipes all of the Docker environment settings. Therefore, you'll need to include those in your squashing Dockerfile:
FROM big_image as source
FROM scratch
COPY --from=source / /
ENV DO_THING=1
ARG username=user
USER ${username}
WORKDIR /opt
Ok! This worked very well. I've now got a working 11GB image where I previously had a 20GB image. But how should I prevent my squasher from falling out of sync with my original image builder? Multi-stage builds, duh!
The final Dockerfile:
FROM ubuntu:20.04 as big_image
# ... do big_image build commands ...
FROM scratch
COPY --from=big_image / /
ENV DO_THING=1
ARG username=user
USER ${username}
WORKDIR /opt
The one thing to watch out for here is that Docker will create dangling images when using multi-stage builds. The easy fix for this is to run docker image prune
after the build.