I’ve come across tips on how to keep Docker images small and Dockerfiles with strange lines that seem to exist only to optimize image size. Well, it turns out they’re all wrong.
They may have an effect with flat Docker images, but everything else (i.e. 99% of what people do), cleanup steps are just extra steps. When Docker builds an image from a Dockerfile, every step is a checkpoint, and every step is saved. If you add 100 MB in one step, then delete it the next, that 100 MB still needs to be saved so other Dockerfiles with the same step can reuse it.
Results
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
test/baseline latest 7b590dec9b43 7 hours ago 272.6 MB
test/baseline_lines latest e165025980f7 9 minutes ago 272.6 MB
test/baseline_lists latest b40f9e108a93 About an hour ago 272.6 MB
test/combo latest 744b502e0052 2 seconds ago 269.8 MB
test/combo2 latest be8f1c1de02e About an hour ago 249.8 MB
test/combo3 latest da948e2838d9 About an hour ago 249.8 MB
test/install latest e7cadcbb5a05 12 hours ago 269.8 MB
test/install_clean latest dd1383285e85 12 hours ago 269.8 MB
test/install_lists latest e55f6f8ebac8 12 hours ago 269.8 MB
test/purge latest ef8c2aa7400b About an hour ago 273.5 MB
test/remove latest 75e3e5c4e246 About an hour ago 273.5 MB
Hypothesis: Docker’s base Ubuntu image does not need `apt-get clean`
I did an experiment around Docker 0.6. I think my conclusion was that `apt-get install … && apt-get clean` saved a few megabytes. But I head that you didn’t need to do that. If you compare the “test/install” and “test/install_clean” size, you’ll see there is no difference. So you don’t need `apt-get clean`.
Hypothesis: `rm -rf /var/lib/apt/lists/*` saves some space
I’ve been seeing a lot of Dockerfiles lately with this line. Including lots of official Docker images. If those guys are all doing it, surely it must have some effect. Nope.
Hypothesis: Combining similar lines saves space
There’s some overhead for each line in a Dockerfile. How significant is it? Well, it turns out it’s not. What I did find out though, is that it does save a significant amount of time and saves a lot of disk thrashing. So combining lines does not save space, but saves time.
Hypothesis: Combining multiple steps saves space
This makes sense. If you skip making checkpoints, you’re not storing intermediate states. And it turns out this is the only way to get a Docker image made from a Dockerfile smaller. But this is at the cost of readability, and more importantly, at the cost of reduced redundancy between images.
Hypothesis: `apt-get purge` saves some space
Well this hypothesis seems silly now. But I see it used now and then. Deletions do not save space.
Conclusion
Write your Dockerfiles the same way you run commands. Don’t prematurely optimize by adding extra cruft you saw someone else do. If you’re actually worried about image size, use some sort of automation to rebuild Docker images behind the scenes. Just keep that logic out of the Dockerfile. And always keep on measuring. Know your bottlenecks.
You should consider intermediate layers, they add up space as well, not just the final image size… see:
http://jonathan.bergknoff.com/journal/building-good-docker-images
https://github.com/jfrazelle/dockerfiles/pull/25