With another Kubernetes release upon us, there are, as ever, a load of new features to consider. Undoubtedly, the most significant change in this release of Kubernetes is the removal of the Dockershim. I added, “understanding Container Runtime Interface (CRI)” into my TO-DO list when Kubernetes made a announcement on abandoning Docker.
In this post I will examine the terminology and tooling around container runtimes. By the end, you’ll have a better idea of what a container runtime is, how the container landscape has evolved over time, and how we got to where we are today.
Container Runtime History
Container runtime sits at the bottom of Kubernetes architecture and defines how Pods and its containers run programs.
It has been enlarging its capability, including isolating runtime env, allocating computing resources, organizing hardware, managing images, starting and stopping programs, etc.
However, before version 1.5, there were only two built-in container runtimes, Docker and rkt, which failed in meeting all users’ demands. For example, Kubernetes allocates CPU and memory resources for Linux machines’ apps using
Cgroup. Then what about the Windows machines? Cgroup doesn’t work anymore, and a container runtime for Windows systems is needed. You were facing the same predicament in OS like Centos, macOS, Debian, etc.
Furthermore, the code by then obviously did not meet the Kubernetes’ principle of flexibility.
Then came the abstract OCI design, overcoming these two shortcomings and bringing significant changes to the container runtime development.
kubeletand actual container runtime implementation, accelerating iteration
- Give community developers the right to customize implementations. This culminated in the implementation of the container runtime interface (CRI), letting system components (like the kubelet) talk to container runtimes in a standardized way.
As Docker is not CRI compliant, dockershim acts as a translation layer between kubelet and Docker. Kubernetes is removing dockershim in the upcoming v1.24 release.The
dockershim component of Kubernetes allows to use Docker as a Kubernetes's container runtime. Kubernetes' built-in
dockershim component was removed in release v1.24. If you are using Docker Engine as a container runtime for your Kubernetes cluster, get ready to migrate in 1.24.
Container Runtime Comparison
A container runtime, also known as container engine, is a software component that can run containers on a host operating system. In a containerized architecture, container runtimes are responsible for loading container images from a repository, monitoring local system resources, isolating system resources for use of a container, and managing container lifecycle. The creation of the OCI specification also provided the freedom to replace the container runtime beyond the Docker Daemon. A container runtime only needs to understand the OCI format to be able to run the container.
We will review different types of container runtimes. Generally, they fall into two main categories: Open Container Initiative (OCI) runtimes/Low-Level Container Runtimes and Container Runtime Interface (CRI)/ High Level Runtime.
Low-Level Container Runtimes/ OCI Runtimes
Low-level runtimes are responsible for the mechanics of actually running a container. They are responsible for creating and running containers. Once the containerized process runs, the container runtime is not required to perform other tasks. This is because low-level runtimes abstract the Linux primitives and are not designed to perform additional tasks. Low level runtimes create and run “the container.”
The most popular low-level runtimes include:
- runC — created by Docker and the OCI. It is now the de-facto standard low-level container runtime. runC is written in Go. It is maintained under moby — Docker’s open source project.
- crun — an OCI implementation led by Redhat. crun is written in C. It is designed to be lightweight and performant, and was among the first runtimes to support cgroups v2.
Sandboxed and virtualized runtimes
In addition to native runtimes, which run the containerized process on the same host kernel, there are some sandboxed and virtualized implementers of the OCI spec:
gVisor is sandboxed runtime which provide further isolation of the host from the containerized process. Instead of sharing the host kernel, the containerized process runs on a unikernel or kernel proxy layer, which then interacts with the host kernel on the container’s behalf. Because of this increased isolation, these runtimes have a reduced attack surface and make it less likely that a containerized process can have a maleffect on the host.
Kata is virtualized runtime as it is an implementations of the OCI Runtime spec that are backed by a virtual machine interface rather than the host kernel. It start a lightweight virtual machine with a standard Linux kernel image and run the “containerized” process in that virtual machine.
In contrast to native runtimes, sandboxed and virtualized runtimes have performance impacts through the entire life of a containerized process.
CRI/High Level Runtime
High-level runtimes are responsible for transport and management of container images, unpacking the image, and passing off to the low-level runtime to run the container. Typically, high-level runtimes provide a daemon application and an API that remote applications can use to logically run containers and monitor them but they sit on top of and delegate to low-level runtimes or other high-level runtimes for the actual work.
As the CRI has additional concerns over an OCI Runtime including image management and distribution, storage, snapshotting, networking ,CRIs usually delegate to an OCI Runtime for the actual container execution.
The first CRI implementation was the dockershim, which provided the agreed-upon layer of abstraction in front of the Docker engine.
There are two main players in the CRI space at present:
- containerd — Extracted from the early docker source code, it is also the current industry-standard container runtime.
- CRI-O — an open-source implementation of Kubernetes’ container runtime interface (CRI), offering a lightweight alternative to rkt and Docker.
CRI-O is the CRI implementation provided by Kubernetes.By default, cri-o uses runC as its OCI, but on recent RedHat Fedora installations (with cgroups v2) it will use crun. Since it has full OCI compatibility, cri-o works out of the box with low level runtimes such as Kata without any additional pieces and minimal configuration.
Extracted from the early docker source code, it is also the current industry-standard container runtime.By default it uses runC under the hood.Like the rest of the container tools that originated from Docker, it is the current de-facto standard CRI.
You may notice in reading the above that Docker is not a CRI or OCI implementation but uses both (via containerd and runC). In fact, it has additional features like image building and signing that are out of scope of either CRI or OCI specs. So where does this fit in?
Docker calls their product the “Docker Engine”, and generically these full container tools suites may be referred to as Container Engines. No one except Docker provides such a full featured single executable, but we can piece a comparable suite of tools together from the Containers Tools project.
The Container Tools project follows the UNIX philosophy of small tools which do one thing well:
- podman — image running
- buildah — image building
- skopeo — image distribution