diff --git a/sources/tech/20231227 Kubernetes with CRI-O on Fedora Linux 39.md b/sources/tech/20231227 Kubernetes with CRI-O on Fedora Linux 39.md new file mode 100644 index 0000000000..069bf8ef65 --- /dev/null +++ b/sources/tech/20231227 Kubernetes with CRI-O on Fedora Linux 39.md @@ -0,0 +1,668 @@ +[#]: subject: "Kubernetes with CRI-O on Fedora Linux 39" +[#]: via: "https://fedoramagazine.org/kubernetes-with-cri-o-on-fedora-linux-39/" +[#]: author: "Roman Gherta https://fedoramagazine.org/author/romangherta/" +[#]: collector: "lujun9972/lctt-scripts-1700446145" +[#]: translator: " " +[#]: reviewer: " " +[#]: publisher: " " +[#]: url: " " + +Kubernetes with CRI-O on Fedora Linux 39 +====== + +![][1] + +Photo by [Christian Pfeifer][2] on [Unsplash][3] (cropped) + +[Kubernetes][4] is a self-healing and scalable container orchestration platform. It abstracts away the underlying infrastructure and makes life easier for administrators and developers by improving productivity, deployment lifecycle, and by streamlining devops processes. The goal of this article is to show how to deploy a Kubernetes cluster on Fedora Linux 39 machines using CRI-O as a container engine. + +### 1\. Preparing the cluster nodes + +Both master and worker nodes must be prepared before installing Kubernetes. Preparations ensure proper capabilities, proper kernel modules are loaded, swap, cgroups version and other prerequisites to installing the cluster. + +#### Kernel modules + +Kubernetes, in its standard configuration, requires the following kernel modules and configuration values for bridging network traffic, overlaying filesystems, and forwarding network packets. An adequate size for user and pid namespaces for userspace containers is also provided in the below configuration example. + +``` + + [user@fedora ~]$ sudo cat <= 2.0.2-1 + container-selinux + containers-common >= 1:0.1.31-14 + libseccomp.so.2()(64bit) + /etc/cni/net.d/100-crio-bridge.conflist + /etc/cni/net.d/200-loopback.conflist + /etc/crictl.yaml + /etc/crio/crio.conf + ... + +``` + +Notice it uses _conmon_ for monitoring and _container-selinux_ policies. Also, the main configuration file is _crio.conf_ and it added some default networking plugins to /etc/cni. For networking, this guide will not rely on the default CRI-O plugins; though it is possible to use them. + +``` + + [user@fedora ~]$ sudo rm -rf /etc/cni/net.d/* + +``` + +Besides the above configuration files, CRI-O uses the same image and storage libraries as Podman. So you can use the same configuration files for registries and signature verification policies as you would when using Podman. See the CRI-O [README][5] for examples. + +#### Cgroups v2 + +Recent versions of Fedora Linux have cgroups v2 enabled by default. Cgroups v2 brings better control over memory and CPU resource management. With cgroups v1, a pod would receive a kill signal when a container exceeds the memory limit. With cgroups v2, memory allocation is “throttled” by systemd. See the cgroupfsv2 docs for more details about the changes. + +``` + + [user@fedora ~]$ stat -f /sys/fs/cgroup/ + File: "/sys/fs/cgroup/" + ID: 0 Namelen: 255 Type: cgroup2fs + +``` + +#### Additional runtimes + +In Fedora Linux, systemd is both the init system and the default cgroups driver/manager. While checking _crio.conf_ we notice this version already uses systemd. If no other cgroups driver is explicitly passed to kubeadm, then kubelet will also use systemd by default in version 1.27. We will set systemd explicitly, nonetheless, and change the default runtime to _crun_ which is faster and has a smaller memory footprint. We will also define each new runtime block as shown below. We will use configuration drop-in files and make sure the files are labeled with the proper selinux context. + +``` + + [user@fedora ~]$ sudo dnf install -y crun + + [user@fedora ~]$ sudo sed -i 's/# cgroup_manager/cgroup_manager/g' /etc/crio/crio.conf + [user@fedora ~]$ sudo sed -i 's/# default_runtime = "runc"/default_runtime = "crun"/g' /etc/crio/crio.conf + + [user@fedora ~]$ sudo mkdir /etc/crio/crio.conf.d + [user@fedora ~]$ sudo tee -a /etc/crio/crio.conf.d/90-crun < ../run/systemd/resolve/stub-resolv.conf + +``` + +The reference to 127.0.0.53 triggers a coredns loop plugin error in Kubernetes. A list of next-hop DNS servers is maintained by systemd in /run/systemd/resolve/resolv.conf. According to the systemd-resolved man page, the /etc/resolv.conf file can be symlinked to /run/systemd/resolve/resolv.conf so that local DNS clients will bypass systemd-resolved and talk directly to the DNS servers. For some DNS clients, however, bypassing systemd-resolved might not be desirable. + +A better approach is to configure kubelet to use the resolv.conf file. Configuring kubelet to reference the alternate resolv.conf will be demonstrated in the following sections. + +#### Kubernetes packages + +We will use kubeadm that is a mature package to easily and quickly install production-grade Kubernetes. + +``` + + [user@fedora ~]$ sudo dnf install -y kubernetes-kubeadm kubernetes-client + +``` + +kubernetes-kubeadm generates a kubelet drop-in file at / _etc/systemd/system/kubelet.service.d/kubeadm.conf_. This file can be used to configure instance-specific kubelet configurations. However, the recommended approach is to use kubeadm configuration files. For example, kubeadm creates /var/lib/kubelet/kubeadm-flags.env that is referenced by the above mentioned kubelet drop-in file. + +The kubelet will be started automatically by kubeadm. For now we will enable it so it persists across restarts. + +``` + + [user@fedora ~]$ sudo systemctl enable kubelet + +``` + +### 2\. Initialize the Control Plane + +For the installation, we pass some cluster wide configuration to kubeadm like pod and service [CIDR][9]s. For more details refer to [kubeadm configuration docs ][10] and [kubelet config docs][11]. + +``` + + [user@fedora ~]$ cat < kubeadmin-config.yaml + --- + apiVersion: kubeadm.k8s.io/v1beta3 + kind: InitConfiguration + nodeRegistration: + name: master1 + criSocket: "unix:///var/run/crio/crio.sock" + imagePullPolicy: "IfNotPresent" + kubeletExtraArgs: + cgroup-driver: "systemd" + resolv-conf: "/run/systemd/resolve/resolv.conf" + max-pods: "4096" + max-open-files: "20000000" + --- + apiVersion: kubeadm.k8s.io/v1beta3 + kind: ClusterConfiguration + kubernetesVersion: "1.27.0" + networking: + podSubnet: "10.32.0.0/16" + serviceSubnet: "172.16.16.0/22" + controllerManager: + extraArgs: + node-cidr-mask-size: "20" + allocate-node-cidrs: "true" + --- + CONFIG + +``` + +In the above configuration, we have chosen different IP classes for pods and services. This is useful when debugging. Make sure they do not overlap with your node’s CIDR. To summarize the IP ranges: + + * services “172.16.16.0/22” – 1024 services cluster wide + * pods “10.32.0.0/16” – 65536 pods cluster wide, max 4096 pods per kubelet and 20 mln open files per kubelet. For other important kubelet parameters refer to [kubelet config docs][11]. Kubelet is an important component running on the worker nodes so make sure you read the config docs carefully. + + + +kube-controller-manager has a component called nodeipam that splits the podcidr into smaller ranges and allocates these ranges to each node via the ( _node.spec.podCIDR_ / _node.spec.podCIDRs)_ properties. Controller Manager property _‐‐node-cidr-mask-size_ defines the size of this range. By default it is /24, but if you have enough resources you can make it larger; in our case /20. This will result in 4096 pods per node with a maximum of 65536/4096=16 nodes. Adjust these properties to fit the capacity of your bare-metal server. + +``` + + [user@fedora ~]$ hostnamectl set-hostname master1 + [user@master1 ~]$ sudo kubeadm init --skip-token-print=true --config=kubeadmin-config.yaml + + [user@master1 ~]$ mkdir -p $HOME/.kube + [user@master1 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config + [user@master1 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config + +``` + +There are newer networking plugins that leverage ebpf kernel capabilities or ovn. However, installing such plugins requires uninstalling kube-proxy and we want to maintain the deployment as standard as possible. Some of the networking plugins read the _kubeadm-config_ configmap and set up the corect CIDR values without the need to read a lot of documentation. + +``` + + [user@master1 ~]$ kubectl create -f https://github.com/antrea-io/antrea/releases/download/v1.14.0/antrea.yml + +``` + +Antrea, OVN-Kubernetes are interesting CNCF projects; especially for bare-metal clusters where network speed becomes a bottleneck. It also has support for some high-speed Mellanox network cards. Check pods and svc health and whether a correct IP address was assigned. + +``` + + [user@master1 ~]$ kubectl get pods -A -o wide + NAME READY IP NODE + antrea-agent-x2j7r 2/2 192.168.122.3 master1 + antrea-controller-5f7764f86f-8xgkc 1/1 192.168.122.3 master1 + coredns-787d4945fb-55pdq 1/1 10.32.0.2 master1 + coredns-787d4945fb-ndn78 1/1 10.32.0.3 master1 + etcd-master1 1/1 192.168.122.3 master1 + kube-apiserver-master1 1/1 192.168.122.3 master1 + kube-controller-manager-master1 1/1 192.168.122.3 master1 + kube-proxy-mx7ns 1/1 192.168.122.3 master1 + kube-scheduler-master1 1/1 192.168.122.3 master1 + + [user@master1 ~]$ kubectl get svc -A + NAMESPACE NAME TYPE CLUSTER-IP + default kubernetes ClusterIP 172.16.16.1 + kube-system antrea ClusterIP 172.16.18.214 + kube-system kube-dns ClusterIP 172.16.16.10 + + [user@master1 ~]$ kubectl describe node master1 | grep PodCIDR + PodCIDR: 10.32.0.0/20 + PodCIDRs: 10.32.0.0/20 + +``` + +All pods should be running and healthy. Notice how the static pods and the daemonsets have the same IP address as the node. CoreDNS is also reading directly from the /run/systemd/resolve/resolv.conf file and not crashing. + +Generate a token for joining the worker node. + +``` + + [user@master1 ~]$ kubeadm token create --ttl=30m --print-join-command + +``` + +The output of this command contains details for joining the worker node. + +### 3\. Join a Worker Node + +We need to set the hostname and _kubeadm join_. Kubelet on this node also requires configuration. Do this at the systemd level or by using a kubeadm config file with placeholders. Replace the placeholders with the values from the previous command. The kubelet args respect the same convention as [kubelet params][12], but without leading dashes. + +``` + + [user@fedora ~]$ hostnamectl set-hostname worker1 + + [user@worker1 ~]$ cat < join-config.yaml + --- + apiVersion: kubeadm.k8s.io/v1beta3 + kind: JoinConfiguration + discovery: + bootstrapToken: + token: + apiServerEndpoint: + caCertHashes: [""] + timeout: 5m + nodeRegistration: + name: worker1 + criSocket: "unix:///var/run/crio/crio.sock" + imagePullPolicy: "IfNotPresent" + kubeletExtraArgs: + cgroup-driver: "systemd" + resolv-conf: "/run/systemd/resolve/resolv.conf" + max-pods: "4096" + max-open-files: "20000000" + --- + CONFIG + + [user@worker1 ~]$ sudo kubeadm join --config=join-config.yaml + +``` + +From master node check the range allocated by nodeipam to both nodes: + +``` + + [user@master1 ~]$ kubectl describe node worker1 | grep PodCIDR + PodCIDR: 10.32.16.0/20 + PodCIDRs: 10.32.16.0/20 + +``` + +Notice the cluster-wide pod CIDR — 10.32.0.0/16 — was split by Controller Manager into 10.32.0.0/20 for the first node and 10.32.16.0/20 for the second node with non-overlapping segments of 4096 IP addresses each. + +### 4\. Security considerations + +Run three sample pods to test the setup. + +``` + + [user@master1 ~]$ kubectl apply -f - <