TranslateProject/sources/tech/20231227 Kubernetes with CRI-O on Fedora Linux 39.md
DarkSun 342b7f80af 选题[tech]: 20231227 Kubernetes with CRI-O on Fedora Linux 39
sources/tech/20231227 Kubernetes with CRI-O on Fedora Linux 39.md
2023-12-28 05:03:20 +08:00

30 KiB
Raw Blame History

Kubernetes with CRI-O on Fedora Linux 39

Photo by Christian Pfeifer on Unsplash (cropped)

Kubernetes is a self-healing and scalable container orchestration platform. It abstracts away the underlying infrastructure and makes life easier for administrators and developers by improving productivity, deployment lifecycle, and by streamlining devops processes. The goal of this article is to show how to deploy a Kubernetes cluster on Fedora Linux 39 machines using CRI-O as a container engine.

1. Preparing the cluster nodes

Both master and worker nodes must be prepared before installing Kubernetes. Preparations ensure proper capabilities, proper kernel modules are loaded, swap, cgroups version and other prerequisites to installing the cluster.

Kernel modules

Kubernetes, in its standard configuration, requires the following kernel modules and configuration values for bridging network traffic, overlaying filesystems, and forwarding network packets. An adequate size for user and pid namespaces for userspace containers is also provided in the below configuration example.


    [user@fedora ~]$ sudo cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
    overlay
    br_netfilter
    EOF

    [user@fedora ~]$ systemctl restart systemd-modules-load.service

    [user@fedora ~]$  cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-iptables  = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    net.ipv4.ip_forward                 = 1
    user.max_pid_namespaces             = 1048576
    user.max_user_namespaces            = 1048576
    EOF

    [user@fedora ~]$  sudo sysctl --system

Installing CRI-O

Container Runtime Interface OCI is an opensource container engine dedicated to Kubernetes. The engine implements the Kubernetes grpc protocol (CRI) and is compatible with any low-level OCI container runtime. All supported runtimes must be installed separately on the host. It is important to note that CRI-O is version-locked with Kubernetes. We will deploy cri-o:1.27 with kubernetes:1.27 on fedora-39.


    [user@fedora ~] sudo dnf install -y cri-o cri-tools

To check what the package installed:


    [user@fedora ~]$ rpm -qRc cri-o
    config(cri-o) = 0:1.27.1-2.fc39conmon >= 2.0.2-1
    container-selinux
    containers-common >= 1:0.1.31-14
    libseccomp.so.2()(64bit)
    /etc/cni/net.d/100-crio-bridge.conflist
    /etc/cni/net.d/200-loopback.conflist
    /etc/crictl.yaml
    /etc/crio/crio.conf
    ...

Notice it uses conmon for monitoring and container-selinux policies. Also, the main configuration file is crio.conf and it added some default networking plugins to /etc/cni. For networking, this guide will not rely on the default CRI-O plugins; though it is possible to use them.


    [user@fedora ~]$ sudo rm -rf /etc/cni/net.d/*

Besides the above configuration files, CRI-O uses the same image and storage libraries as Podman. So you can use the same configuration files for registries and signature verification policies as you would when using Podman. See the CRI-O README for examples.

Cgroups v2

Recent versions of Fedora Linux have cgroups v2 enabled by default. Cgroups v2 brings better control over memory and CPU resource management. With cgroups v1, a pod would receive a kill signal when a container exceeds the memory limit. With cgroups v2, memory allocation is “throttled” by systemd. See the cgroupfsv2 docs for more details about the changes.


    [user@fedora ~]$ stat -f /sys/fs/cgroup/
      File: "/sys/fs/cgroup/"
        ID: 0        Namelen: 255     Type: cgroup2fs

Additional runtimes

In Fedora Linux, systemd is both the init system and the default cgroups driver/manager. While checking crio.conf we notice this version already uses systemd. If no other cgroups driver is explicitly passed to kubeadm, then kubelet will also use systemd by default in version 1.27. We will set systemd explicitly, nonetheless, and change the default runtime to crun which is faster and has a smaller memory footprint. We will also define each new runtime block as shown below. We will use configuration drop-in files and make sure the files are labeled with the proper selinux context.


    [user@fedora ~]$ sudo dnf install -y crun

    [user@fedora ~]$ sudo sed -i 's/# cgroup_manager/cgroup_manager/g' /etc/crio/crio.conf
    [user@fedora ~]$ sudo sed -i 's/# default_runtime = "runc"/default_runtime = "crun"/g' /etc/crio/crio.conf

    [user@fedora ~]$ sudo mkdir /etc/crio/crio.conf.d
    [user@fedora ~]$ sudo tee -a /etc/crio/crio.conf.d/90-crun <<CRUN
    [crio.runtime.runtimes.crun]
    runtime_path = "/usr/bin/crun"
    runtime_type = "oci"
    CRUN


    [user@fedora ~]$ echo "containers:1000000:1048576" | sudo tee -a /etc/subuid
    [user@fedora ~]$ echo "containers:1000000:1048576" | sudo tee -a /etc/subgid
    [user@fedora ~]$ sudo tee -a /etc/crio/crio.conf.d/91-userns <<USERNS
    [crio.runtime.workloads.userns]
    activation_annotation = "io.kubernetes.cri-o.userns-mode"
    allowed_annotations = ["io.kubernetes.cri-o.userns-mode"]
    USERNS

    [user@fedora ~]$ sudo chcon -R --reference=/etc/crio/crio.conf  /etc/crio/crio.conf.d/

    [user@fedora ~]$ sudo ls -laZ /etc/crio/crio.conf.d/
    root root system_u:object_r:container_config_t:s0  70 Nov  1 19:26 .
    root root system_u:object_r:container_config_t:s0  40 Nov  1 11:12 ..
    root root system_u:object_r:container_config_t:s0  81 Nov  1 11:14 90-crun
    root root system_u:object_r:container_config_t:s0 148 Dec 11 13:20 91-user

crio.conf respects the TOML format and is easily managed and maintained. The help/man pages are also detailed. After you change the configuration, enable the service.


    [user@fedora ~]$ sudo systemctl daemon-reload
    [user@fedora ~]$ sudo systemctl enable crio --now

Disable swap

The latest Fedora Linux versions enable swap-on-zram by default. zram creates an emulated device that uses RAM as storage and compresses memory pages. It is faster than traditional disk partitions. You can use zramctl to inspect and configure your zram device(s). However, the devices initialization and mounting are performed by systemd on system startup as configured in the zram-generator.conf file.


    [user@fedora ~]$ lsblk
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
    zram0  251:0    0  3.8G  0 disk [SWAP]
    vda    252:0    0   15G  0 disk

    [user@fedora ~]$ sudo swapoff -a
    [user@fedora ~]$ sudo zramctl --reset /dev/zram0
    [user@fedora ~]$ sudo dnf -y remove zram-generator-defaults

Firewall rules

Keep the firewall enabled and open only the necessary ports in accordance with the official docs. We have a set of rules for the Control Planes nodes.


    [user@fedora ~]$ sudo firewall-cmd --set-default-zone=internal
    [user@fedora ~]$ sudo firewall-cmd --permanent \
    --add-port=6443/tcp --add-port=2379-2380/tcp \
    --add-port=10250/tcp --add-port=10259/tcp \
    --add-port=10257/tcp
    [user@fedora ~]$ sudo firewall-cmd --reload

For Worker nodes, the following configuration must be used given the default service port range.


    [user@fedora ~]$ sudo firewall-cmd --set-default-zone=internal
    [user@fedora ~]$ sudo firewall-cmd --permanent  \
    --add-port=10250/tcp --add-port=30000-32767/tcp
    [user@fedora ~]$ sudo firewall-cmd --reload

Please note we did not discuss network topology. In such discussions, control plane nodes and worker nodes are on different subnets. Each subnet has a interface that connects all hosts. VMs could have multiple interfaces and/or the administrator might want to associate a specific interface with a specific zone and open ports on that interface. In such cases you will explicitly provide the zone argument to the above commands.

The DNS service

Fedora Linux 39 comes with systemd-resolved configured as its DNS resolver. In this configuration the user has access to a local stub file that contains a 127.0.0.53 entry that directs local DNS clients to systemd-resolved.


    lrwxrwxrwx. 1 root root 39 Sep 11  2022 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

The reference to 127.0.0.53 triggers a coredns loop plugin error in Kubernetes. A list of next-hop DNS servers is maintained by systemd in /run/systemd/resolve/resolv.conf. According to the systemd-resolved man page, the /etc/resolv.conf file can be symlinked to /run/systemd/resolve/resolv.conf so that local DNS clients will bypass systemd-resolved and talk directly to the DNS servers. For some DNS clients, however, bypassing systemd-resolved might not be desirable.

A better approach is to configure kubelet to use the resolv.conf file. Configuring kubelet to reference the alternate resolv.conf will be demonstrated in the following sections.

Kubernetes packages

We will use kubeadm that is a mature package to easily and quickly install production-grade Kubernetes.


    [user@fedora ~]$ sudo dnf install -y kubernetes-kubeadm kubernetes-client

kubernetes-kubeadm generates a kubelet drop-in file at / etc/systemd/system/kubelet.service.d/kubeadm.conf. This file can be used to configure instance-specific kubelet configurations. However, the recommended approach is to use kubeadm configuration files. For example, kubeadm creates /var/lib/kubelet/kubeadm-flags.env that is referenced by the above mentioned kubelet drop-in file.

The kubelet will be started automatically by kubeadm. For now we will enable it so it persists across restarts.


    [user@fedora ~]$ sudo systemctl enable kubelet

2. Initialize the Control Plane

For the installation, we pass some cluster wide configuration to kubeadm like pod and service CIDRs. For more details refer to kubeadm configuration docs and kubelet config docs.


    [user@fedora ~]$ cat <<CONFIG > kubeadmin-config.yaml
    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: InitConfiguration
    nodeRegistration:
      name: master1
      criSocket: "unix:///var/run/crio/crio.sock"
      imagePullPolicy: "IfNotPresent"
      kubeletExtraArgs:
        cgroup-driver: "systemd"
        resolv-conf: "/run/systemd/resolve/resolv.conf"
        max-pods: "4096"
        max-open-files: "20000000"
    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    kubernetesVersion: "1.27.0"
    networking:
      podSubnet: "10.32.0.0/16"
      serviceSubnet: "172.16.16.0/22"
    controllerManager:
      extraArgs:
        node-cidr-mask-size: "20"
        allocate-node-cidrs: "true"
    ---
    CONFIG

In the above configuration, we have chosen different IP classes for pods and services. This is useful when debugging. Make sure they do not overlap with your nodes CIDR. To summarize the IP ranges:

  • services “172.16.16.0/22” 1024 services cluster wide
  • pods “10.32.0.0/16” 65536 pods cluster wide, max 4096 pods per kubelet and 20 mln open files per kubelet. For other important kubelet parameters refer to kubelet config docs. Kubelet is an important component running on the worker nodes so make sure you read the config docs carefully.

kube-controller-manager has a component called nodeipam that splits the podcidr into smaller ranges and allocates these ranges to each node via the ( node.spec.podCIDR / node.spec.podCIDRs) properties. Controller Manager property node-cidr-mask-size defines the size of this range. By default it is /24, but if you have enough resources you can make it larger; in our case /20. This will result in 4096 pods per node with a maximum of 65536/4096=16 nodes. Adjust these properties to fit the capacity of your bare-metal server.


    [user@fedora ~]$ hostnamectl set-hostname master1
    [user@master1 ~]$ sudo kubeadm init --skip-token-print=true --config=kubeadmin-config.yaml

    [user@master1 ~]$ mkdir -p $HOME/.kube
    [user@master1 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    [user@master1 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

There are newer networking plugins that leverage ebpf kernel capabilities or ovn. However, installing such plugins requires uninstalling kube-proxy and we want to maintain the deployment as standard as possible. Some of the networking plugins read the kubeadm-config configmap and set up the corect CIDR values without the need to read a lot of documentation.


    [user@master1 ~]$ kubectl create -f https://github.com/antrea-io/antrea/releases/download/v1.14.0/antrea.yml

Antrea, OVN-Kubernetes are interesting CNCF projects; especially for bare-metal clusters where network speed becomes a bottleneck. It also has support for some high-speed Mellanox network cards. Check pods and svc health and whether a correct IP address was assigned.


    [user@master1 ~]$ kubectl get pods -A -o wide
    NAME                                 READY  IP              NODE
    antrea-agent-x2j7r                   2/2    192.168.122.3   master1
    antrea-controller-5f7764f86f-8xgkc   1/1    192.168.122.3   master1
    coredns-787d4945fb-55pdq             1/1    10.32.0.2       master1
    coredns-787d4945fb-ndn78             1/1    10.32.0.3       master1
    etcd-master1                         1/1    192.168.122.3   master1
    kube-apiserver-master1               1/1    192.168.122.3   master1
    kube-controller-manager-master1      1/1    192.168.122.3   master1
    kube-proxy-mx7ns                     1/1    192.168.122.3   master1
    kube-scheduler-master1               1/1    192.168.122.3   master1

    [user@master1 ~]$ kubectl get svc -A
    NAMESPACE     NAME         TYPE        CLUSTER-IP
    default       kubernetes   ClusterIP   172.16.16.1
    kube-system   antrea       ClusterIP   172.16.18.214
    kube-system   kube-dns     ClusterIP   172.16.16.10

    [user@master1 ~]$ kubectl describe node master1 | grep PodCIDR
    PodCIDR:                      10.32.0.0/20
    PodCIDRs:                     10.32.0.0/20

All pods should be running and healthy. Notice how the static pods and the daemonsets have the same IP address as the node. CoreDNS is also reading directly from the /run/systemd/resolve/resolv.conf file and not crashing.

Generate a token for joining the worker node.


    [user@master1 ~]$ kubeadm token create --ttl=30m --print-join-command

The output of this command contains details for joining the worker node.

3. Join a Worker Node

We need to set the hostname and kubeadm join. Kubelet on this node also requires configuration. Do this at the systemd level or by using a kubeadm config file with placeholders. Replace the placeholders with the values from the previous command. The kubelet args respect the same convention as kubelet params, but without leading dashes.


    [user@fedora ~]$ hostnamectl set-hostname worker1

    [user@worker1 ~]$ cat <<CONFIG > join-config.yaml
    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: JoinConfiguration
    discovery:
      bootstrapToken:
        token: <TOKEN>
        apiServerEndpoint: <MASTER-IP:PORT>
        caCertHashes: ["<HASH>"]
        timeout: 5m
    nodeRegistration:
      name: worker1
      criSocket: "unix:///var/run/crio/crio.sock"
      imagePullPolicy: "IfNotPresent"
      kubeletExtraArgs:
        cgroup-driver: "systemd"
        resolv-conf: "/run/systemd/resolve/resolv.conf"
        max-pods: "4096"
        max-open-files: "20000000"
    ---
    CONFIG

    [user@worker1 ~]$ sudo kubeadm join --config=join-config.yaml

From master node check the range allocated by nodeipam to both nodes:


    [user@master1 ~]$ kubectl describe node worker1 | grep PodCIDR
    PodCIDR:                      10.32.16.0/20
    PodCIDRs:                     10.32.16.0/20

Notice the cluster-wide pod CIDR — 10.32.0.0/16 — was split by Controller Manager into 10.32.0.0/20 for the first node and 10.32.16.0/20 for the second node with non-overlapping segments of 4096 IP addresses each.

4. Security considerations

Run three sample pods to test the setup.


    [user@master1 ~]$ kubectl apply -f - <<EOF
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: fedora
        image: fedora/fedora:latest
        args: ["sleep", "infinity"]
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod-userns-1
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto:size=256"
    spec:
      containers:
      - name: fedora
        image: fedora/fedora:latest
        args: ["sleep", "infinity"]
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod-userns-2
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto:size=256"
    spec:
      containers:
      - name: fedora
        image: fedora/fedora:latest
        args: ["sleep", "infinity"]
    ---
    EOF

4.1 Discretionary Access Control

By default, Linuxs security model is based on Discretionary Access Control (DAC). This security model is based on user identity and the filesystem ownership and permissions associated with that user.

Since containers are Linux processes, you can watch them by running the ps command on your host server. Start a container process and check it using ps. The kubelet is the worker node main process and, by default, it runs as root (uid=0). There is a feature gate — KubeletInUserNamespace — but it is currently in an alpha stage of development. All the other containers will run as user id 0 as well. To properly function, all the containers must mount the /proc and /sys pseudofilesystems and have access to some processes on the host. Under these circumstances, a rogue container process running as root could assume elevated privileges on the host. This should explain the need for isolating processes by running them as underprivileged users.

This “soft” isolation can be done via kubernetes spec.securityContext.(RunAsUser|RunAsGroup|fsGroup), but this method requires additional administrative work like creating and maintaining users and groups etc. This can be automated via Admission Controllers, but we discuss below a different approach using user namespaces.

User namespaces are a Linux feature that is part of the same basic DAC security model. They are enabled by default in the latest Linux versions and you might have encountered them while working with Podman or Singularity.

CRI-O schedules userns workloads via the io.kubernetes.cri-o.userns-mode: “auto:size=n” annotation. This annotation can be added manually to YAML files as demonstrated in the above example or automatically via an admission controller. The annotation based behavior might change. You will need to follow the version updates for Kubernetes and CRI-O.


    user@worker1:~$ cat /etc/subuid
    user:524288:65536
    containers:1000000:1048576

    user@worker1:~$ ps -eo pid,uid,gid,args,label | grep -E 'kubelet|sleep'
      2980      0 0 kubelet system_u:system_r:kubelet_t:s0-s0:c0.c1023
      13067     0 0 sleep   system_u:system_r:container_t:s0:c483,c911
      13078 1000000 1000000 sleep system_u:system_r:container_t:s0:c508,c675
      13105 1000256 1000256 sleep system_u:system_r:container_t:s0:c300,c755

Notice kubelet and the test-pod are running as root on the host while both test-pod-userns are running as temporary dynamic user from the range “containers” defined in /etc/subuid . CRI-O uses the containers/storage plugin and therefore looks for default user containers to map subuid and subgids. According to current /etc/subuid file, the dynamic users will begin at UID 1000000 with a maximum of 1048576 users. The annotation assigns a range of 256 UIDs to each container. To change the defaults and mappings refer to containers-storage.conf man page.

4.2 Mandatory Access Control

SELinux is enabled on Fedora Linux in enforcing mode by default and it implements the Mandatory Access Control (MAC) security model. This model requires explicit rules that allow a labeled source context (process) to access a labeled target context (files|ports).

The labels have the following format as shown in the above examples:


    user:role:type:sensitivity-level:category-levels

CRI-O requires the containers-selinux package. We installed Kubernetes while keeping SELinux in enforcing mode, but there are a few general scenarios that might require additional SELinux configuration:

  • Binding ports
  • Mounting storage
  • Sharing storage
Binding ports

Create a sample pod binding to a privileged host port. This is useful, for example, when creating ingress controllers. You will notice the rootless container was able to bind to the privileged port.


    [user@master1 -A ~]$ kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-hostport
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto:size=256"
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          hostPort: 80
    EOF

    user@worker1:~$ sudo semanage port -l
    http_port_t tcp 80, 443 ...

    user@worker1:~$ sudo ss -plntZ
    Address:Port Process
    0.0.0.0:80 "crio",proc_ctx=system_u:system_r:container_runtime_t:s0

Port 80 (target context) is labeled http_port_t and the process trying to access it (source context) is labeled container_runtime_t. To check specific rules that allow this and to debug potential issues, use sesearch. Although, in this specific example, container_t process was allowed to assume container_runtime_t domain and to bind eventually to port 80, this might not always be desirable.


    user@worker1:~$ sesearch -A -s container_runtime_t -t http_port_t -c tcp_socket

Mounting storage

The process container_t is MCS constrained which means every new container will receive two new random categories. At the moment, when mounting a volume, Kubernetes is not automatically re-labelling files with these two categories. There is a community effort via features like SELinuxMountReadWriteOncePod, but you will have to follow the progress in the future versions. For this demo, we will label the files manually.

The categories cannot have any value. They are defined in the setrans.conf file as shown below. Refer to the SELinux documentation for details about modifying the sizes of the MCS ranges.


    user@worker1:~$ cat /etc/selinux/targeted/setrans.conf
    s0=SystemLow
    s0-s0:c0.c1023=SystemLow-SystemHigh
    s0:c0.c1023=SystemHigh

The DAC permissions are enforced in parallel with the MAC permissions, so the Linux mode bits must be set to grant sufficient access in addition to the SELinux labels. We also need to set the proper label container_file_t and the category level as well. With s0 level all the containers will be able to write to the volume. To restrict access to them, we need to label them with process categories.


    user@worker1:~$ sudo mkdir -m=777 /data
    user@worker1:~$ sudo semanage fcontext -a -t container_file_t /data
    user@worker1:~$ sudo restorecon -R -v /data
    user@worker1:~$ mkdir -m=777 /data/folder{1..2}
    user@worker1:~$ ls -laZ /data
    drwxrwxrwx. root root unconfined_u:object_r:container_file_t:s0 .
    drwxrwxrwx. user user unconfined_u:object_r:container_file_t:s0 folder1
    drwxrwxrwx. user user unconfined_u:object_r:container_file_t:s0 folder2

The semanage fcontext command cannot assign category labels so we will have to use chcat :


    user@worker1:~$ chcat -- +c800 /data/folder1
    user@worker1:~$ chcat -- +c801 /data/folder2
    user@worker1:~$ ls -laZ /data
    drwxrwxrwx. unconfined_u:object_r:container_file_t:s0      .
    drwxrwxrwx. unconfined_u:object_r:container_file_t:s0:c800 folder1
    drwxrwxrwx. unconfined_u:object_r:container_file_t:s0:c801 folder2

With the configuration shown above, the container process must have category c800 to access folder1, and c801 is required to access folder2. To avoid random labeling, pass the spec.securityContext.seLinuxOptions object.


    [user@master1 ~]$ kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-hostpath1
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto:size=256"
    spec:
      securityContext:
        seLinuxOptions:
          level: "s0:c800"
      containers:
      - name: test
        image: fedora/fedora:latest
        args: ["sleep", "infinity"]
        volumeMounts:
        - mountPath: /test
          name: test-volume
      volumes:
      - name: test-volume
        hostPath:
          path: /data
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-hostpath2
      annotations:
        io.kubernetes.cri-o.userns-mode: "auto:size=256"
    spec:
      securityContext:
        seLinuxOptions:
          level: "s0:c801"
      containers:
      - name: test
        image: fedora/fedora:latest
        args: ["sleep", "infinity"]
        volumeMounts:
        - mountPath: /test
          name: test-volume
      volumes:
      - name: test-volume
        hostPath:
          path: /data
    EOF

Next, try to write to these folders. Notice the process labels, file labels, and the file ownership.


    user@master1:~$ kubectl exec test-hostpath1 -- touch /test/folder1/testfile
    user@master1:~$ kubectl exec test-hostpath1 -- touch /test/folder2/testfile
    touch: cannot touch '/test/folder2/testfile': Permission denied

    user@master1:~$ kubectl exec test-hostpath2 -- touch /test/folder2/testfile
    user@master1:~$ kubectl exec test-hostpath2 -- touch /test/folder1/testfile
    touch: cannot touch '/test/folder1/testfile': Permission denied

    user@worker1:~$ ps -eo pid,uid,gid,args,label | grep -E 'sleep'
    40475 1000512 1000512 sleep system_u:system_r:container_t:s0:c801
    40500 1000256 1000256 sleep system_u:system_r:container_t:s0:c800

    user@worker1:~$ ls -laZ /data/folder1
    drwxrwxrwx. user   unconfined_u:object_r:container_file_t:s0:c800 .
    -rw-r--r--. 1000256 system_u:object_r:container_file_t:s0:c800     testfile

    user@worker1:~$ ls -laZ /data/folder2
    drwxrwxrwx. user   unconfined_u:object_r:container_file_t:s0:c801 .
    -rw-r--r--. 1000512 system_u:object_r:container_file_t:s0:c801     testfile

Sharing storage

In the above examples, the containers share storage that has the same group and categories or storage with the most permissive s0 level. In production environments you will most likely deal with dynamic storage provisioners that will have to automatically relabel directories and files with whatever random category labels were assigned by Kubernetes. This means the storage provisioner must be SELinux aware and you need to read the configuration settings carefully for anything SELinux-specific.

Proper file permissions achieve a lot of security. SELinux simply adds a layer of security on top of the base file permissions.

More Security

We have touched on the basics of Fedora Linuxs security models. Securing Kubernetes is a broad field of study and it requires significant effort to come to a full understanding of how it all works. To review the best practices and tools beyond what this article has covered, refer to the SELinux docs and the Linux Foundation CKS learning track.

Conclusion

In this article, we have achieved a small, bare-metal Kubernetes setup running on Fedora Linux. CRI-O is a versatile CNCF graduated project that supports user namespaces and any OCI-compliant runtime. Just like Fedora, Kubernetes is continuously improving and can only benefit from Fedora Linuxs advanced security model and features. Follow the Kubernetes QuickDocs to stay apprised of the latest changes. Thanks to all the hard working people maintaining the above mentioned packages.


via: https://fedoramagazine.org/kubernetes-with-cri-o-on-fedora-linux-39/

作者:Roman Gherta 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出