1. Introduction to Docker caching
Building Dockerfiles can take some time. However, building the same Dockerfile a second time is much faster. Let's get some insight into why and when this is the case.
2. Docker build
When learning about the RUN instruction, we saw that building images can take significant time because the shell commands are actually run when the image is built.
This is necessary because what is saved in the resulting image is not the instructions in the Dockerfile but the changes in the file system the instructions make during the build.
For example, if we have a RUN instruction with a shell command that downloads and opens a zip file with several files inside, the resulting image will contain those files.
3. Docker instructions are linked to File system changes
During the build, an image keeps track of which instructions in the Dockerfile created which change to the file system.
We can view an image as a list of consecutive changes to the file system, with every entry in the list corresponding to a specific Docker instruction in the Dockerfile.
4. Docker layers
Together all the changes to the filesystem for a single instruction in the Dockerfile are called a layer of the image.
All layers together make up a Docker image. There is also some metadata, like the start command set with the CMD instruction.
While building a Dockerfile, we can clearly see when docker is working on the next layer and on which layer it is working from the output.
Like at the bottom of the slide, Docker will print which step it is building out of the total amount of steps whenever the next instruction in the Dockerfile starts building.
5. Docker caching
When building a Dockerfile that we have built before, in front of each layer being built, it says cached in capital letters.
Docker detects which Dockerfile instructions have not changed, and instead of re-running the Dockerfile instruction, it uses the known result it has stored.
Docker will only use cached layers to speed up our builds if the Dockerfile instruction is exactly the same and all previous Dockerfile instructions are also identical to when it originally created and stored this layer.
6. Understanding Docker caching
Understanding when Docker layers are cached when building images is important for two reasons.
First, it will help us understand why sometimes our image stays the same even though we change and rebuild it.
For example, if we have a RUN 'apt-get update' and 'apt-get install python3' instruction in our Dockerfile, and a new version of python3 is released. Rebuilding our Dockerfile will not change anything in the resulting image. Docker will see the same instructions as when it last built this Dockerfile and will assume that the result is the same. It can not know that re-running apt-get update will give another result.
7. Understanding Docker caching
Understanding when Docker layers are cached is also important because it will help us write Dockerfiles, which we can make changes to without all layers having to be rebuilt. This can be done by ordering the instructions in the Dockerfile from least changing to most changing.
8. Understanding Docker caching
This is why we always want to put the Dockerfile instructions to install packages before instructions to copy files into the image. Often we'll change the files when improving our work or fixing bugs, but the packages we need rarely change. This ensures as many cached layers as possible can be re-used.
9. Let's practice!
That was a lot of information. Let's do some exercises to digest what we just learned.