How to Avoid Container Rebuilding Due to containerd Bugs?
About the Author
Deng Yuxing, a Chinese software architect of SUSE Rancher, has six years of experience in cloud native development and participated in Rancher iteration from 1.x to 2.x. Currently, Deng Yuxing is responsible for the development of the Rancher for openEuler (RFO) project.
Introduction
Recently, an issue about the containerd runtime has been noticed. This issue was introduced in containerd v1.6.9/v1.5.15. After containerd is restarted, network-related data (such as pod IP addresses) in metadata of the running pods is lost. The root cause is that some data is not saved to the drive.
Affected versions:
v1.6.9 to v1.6.14. The problem is fixed in v1.6.15.
v1.5.15 to v1.5.16. The problem is fixed in v1.5.17.
You can perform the following steps to reproduce the problem and verify the rectification of the problem.
This article uses RKE2 as an example. The version is RKE2 v1.24.9+rke2r1, which uses k3s-containerd v1.6.12-k3s1 and is affected by this containerd issue.
In the default containerd behavior, restarting the containerd service does not affect running service containers. When a container is started, the parent process of the container is bound to PID 1. Even if you run the systemctl command to restart the containerd service, the status of running containers is not affected.
Problem Reproduction
# Start the rke2 service and wait until the service is started successfully. systemctl start rke2-server
# Configure rke2-related paths and the kube-config path. export KUBECONFIG=/etc/rancher/rke2/rke2.yaml export PATH=/var/lib/rancher/rke2/bin:$PATH
# Use kubectl to query the current cluster status. kubectl get po -A | grep rke2-metrics-server-5b987d776b-gqxv9
# kube-system rke2-metrics-server-5b987d776b-gqxv9 1/1 Running 0 15m
The rke2 service is successfully started on a single node. Proceed with the following containerd-related operations:
# Configure containerd-related environment variables. export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/containerd/config.toml CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/k3s/containerd/containerd.sock
# Use crictl to query pod and container information. crictl pods | grep rke2-metrics-server
# bfad445917423 18 minutes ago Ready rke2-metrics-server-5b987d776b-gqxv9 kube-system 0 (default)
crictl ps | grep rke2-metrics-server
# db5d6392a310e f6dc23a68f5fb 18 minutes ago Running metrics-server 0 bfad445917423 rke2-metrics-server-5b987d776b-gqxv9
Take the pod of metrics-server as an example. Query the network information of the pod and restart containerd to reproduce the problem.
# Query metrics-server pod details. crictl inspectp bfad445917423 | jq .status.network
# {# "additionalIps": [],# "ip": "10.42.0.6"# }
# Stop the rke2-server service and start containerd separately to prevent kubelet from affecting problem reproduction. systemctl stop rke2-server
# Start containerd separately. containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
On a new terminal, use crictl to query the running status of containerd.
crictl pods | grep rke2-metrics-server
# bfad445917423 24 minutes ago Ready rke2-metrics-server-5b987d776b-gqxv9 kube-system 0 (default)
# Query metrics-server pod details again. crictl inspectp bfad445917423 | jq .status.network
# {# "additionalIps": [],# "ip": ""# }
The command output shows that the IP address of the container is lost after containerd is restarted.
Impact
In the preceding example, rke2-server is restarted, and the pod IP address is lost. As a result, the service container is rebuilt, which may cause service interruption.
# After the containerd process is interrupted, restart the rke2-server process. systemctl restart rke2-server kubectl get po -A |grep rke2-metrics-server-5b987d776b-8vg69
# kube-system rke2-metrics-server-5b987d776b-8vg69 1/1 Running 2 (115s ago) 23m
crictl pods | grep rke2-metrics-server
# caba6d8d15823 41 seconds ago Ready rke2-metrics-server-5b987d776b-8vg69 kube-system 1 (default)# 2dec6a11fd36f 22 minutes ago NotReady rke2-metrics-server-5b987d776b-8vg69 kube-system 0 (default)
It shows that, after rke2-server is restarted, pods that use CNI are restarted. Newly created pods are displayed in the output of the crictl pods command.
Verification of Problem Rectification
Download containerd of a new version. k3s-containerd v1.6.14+k3s1 is used for verification. It is a patch version urgently released by Rancher before containerd v1.6.15.
# Pull the new image. docker pull rancher/hardened-containerd:v1.6.14-k3s1-build20230105 mkdir container-new cd container-new
# Obtain containerd of the new version from the image. docker run --rm -it -v ${PWD}:/output rancher/hardened-containerd:v1.6.14-k3s1-build20230105 cp -r /usr/local/bin /output ./output/bin/containerd --version # containerd github.com/k3s-io/containerd v1.6.14-k3s1 6f9c63d571f5026e85a0768f0f2ef03d1c8dbc6e\
# Stop the running container. pkill -f /var/lib/rancher/rke2/data/v1.24.9-rke2r1-d4d8faf800d0/bin/containerd-shim-runc-v2
# Replace the containerd binary version. cp ./bin/* /var/lib/rancher/rke2/bin /var/lib/rancher/rke2/bin/containerd --version # containerd github.com/k3s-io/containerd v1.6.14-k3s1 6f9c63d571f5026e85a0768f0f2ef03d1c8dbc6e
# Start rke2. systemctl start rke2-server
# Use crictl to query the new metrics-server pod. crictl pods | grep " Ready" |grep metrics-server # ad8b101f819df 3 minutes ago Ready rke2-metrics-server-5b987d776b-gqxv9 kube-system 1 (default)
# Stop rke2 and start containerd using the following commands. systemctl stop rke2-server containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
On a new terminal, use crictl to query the running status of containerd.
crictl inspectp ad8b101f819df | jq .status.network # {# "additionalIps": [],# "ip": "10.42.0.13"# }
After containerd is restarted, the pod IP address is not lost.
RKE2 and RFO
The following RKE2 versions are affected by this issue:
v1.23.15+rke2r1
v1.24.9+rke2r1
v1.25.5+rke2r1
v1.26.0+rke2r1
This issue was submitted on December 20, 2022. On January 6, 2023, the RKE2 team merged the commit for fixing this issue in containerd, released k3s-containerd v1.6.14+k3s1, and released a new RKE2 RC version for testing and verification.
On January 11, the RKE2 team released the following versions that have fixed the containerd issue:
v1.23.16+rke2r1
v1.24.9+rke2r2
v1.25.5+rke2r2
v1.26.0+rke2r2
Short for Rancher for openEuler, RFO aims to build a Rancher basic platform for openEuler.
The RFO release schedule is later than that of RKE2. Therefore, RFO is not affected by this issue. The following RFO versions are released recently:
v1.23.16+rfor1
v1.24.9+rfor1
v1.24.10+rfor1
v1.25.5+rfor1
v1.25.6+rfor1
v1.26.0+rfor1
v1.26.1+rfor1
Conclusion
In most cases, software problems cannot be rectified up to date due to software release arrangements. Issues such as CVE and function defects are urgent. However, it takes a long time for an OS vendor to provide a fixed version. Sometimes, an OS vendor is even not able to provide a new version due to some restrictions, which brings uncertainties to system running.
In such cases, you can pack and distribute components on which software depends in the root filesystem to better manage and upgrade software, thus avoiding potential losses and system running risks.