选题[tech]: 20210602 Test Kubernetes cluster failures and experiments in your terminal

sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md
2025-02-25 00:50:15 +08:00 · 2021-06-03 05:03:31 +08:00 · 2021-06-03 05:03:31 +08:00 · 37282ac330
commit 37282ac330
parent 62ce58c3e2
1 changed files with 486 additions and 0 deletions
--- a/sources/tech/20210602
+++ b/sources/tech/20210602
@ -0,0 +1,486 @@
+[#]: subject: (Test Kubernetes cluster failures and experiments in your terminal)
+[#]: via: (https://opensource.com/article/21/6/kubernetes-litmus-chaos)
+[#]: author: (Jessica Cherry https://opensource.com/users/cherrybomb)
+[#]: collector: (lujun9972)
+[#]: translator: ( )
+[#]: reviewer: ( )
+[#]: publisher: ( )
+[#]: url: ( )
+
+Test Kubernetes cluster failures and experiments in your terminal
+======
+Litmus is an effective tool to cause chaos to test how your system will
+respond to failure.
+![Science lab with beakers][1]
+
+Do you know how your system will respond to an arbitrary failure? Will your application fail? Will anything survive after a loss? If you're not sure, it's time to see if your system passes the [Litmus][2] test, a detailed way to cause chaos at random with many experiments.
+
+In the first article in this series, I explained [what chaos engineering is][3], and in the second article, I demonstrated how to get your [system's steady state][4] so that you can compare it against a chaos state. This third article will show you how to install and use Litmus to test arbitrary failures and experiments in your Kubernetes cluster. In this walkthrough, I'll use Pop!_OS 20.04, Helm 3, Minikube 1.14.2, and Kubernetes 1.19.
+
+### Configure Minikube
+
+If you haven't already, [install Minikube][5] in whatever way makes sense for your environment. If you have enough resources, I recommend giving your virtual machine a bit more than the default memory and CPU power:
+
+
+```
+$ minikube config set memory 8192
+❗  These changes will take effect upon a minikube delete and then a minikube start
+$ minikube config set cpus 6
+❗  These changes will take effect upon a minikube delete and then a minikube start
+```
+
+Then start and check your system's status:
+
+
+```
+$ minikube start
+😄  minikube v1.14.2 on Debian bullseye/sid
+🎉  minikube 1.19.0 is available! Download it: <https://github.com/kubernetes/minikube/releases/tag/v1.19.0>
+💡  To disable this notice, run: 'minikube config set WantUpdateNotification false'
+
+✨  Using the docker driver based on user configuration
+👍  Starting control plane node minikube in cluster minikube
+🔥  Creating docker container (CPUs=6, Memory=8192MB) ...
+🐳  Preparing Kubernetes v1.19.0 on Docker 19.03.8 ...
+🔎  Verifying Kubernetes components...
+🌟  Enabled addons: storage-provisioner, default-storageclass
+🏄  Done! kubectl is now configured to use "minikube" by default
+jess@Athena:~$ minikube status
+minikube
+type: Control Plane
+host: Running
+kubelet: Running
+apiserver: Running
+kubeconfig: Configured
+```
+
+### Install Litmus
+
+As outlined on [Litmus' homepage][6], the steps to install Litmus are: add your repo to Helm, create your Litmus namespace, then install your chart:
+
+
+```
+$ helm repo add litmuschaos <https://litmuschaos.github.io/litmus-helm/>
+"litmuschaos" has been added to your repositories
+
+$ kubectl create ns litmus
+namespace/litmus created
+
+$ helm install chaos litmuschaos/litmus --namespace=litmus
+NAME: chaos
+LAST DEPLOYED: Sun May  9 17:05:36 2021
+NAMESPACE: litmus
+STATUS: deployed
+REVISION: 1
+TEST SUITE: None
+NOTES:
+```
+
+### Verify the installation
+
+You can run the following commands if you want to verify all the desired components are installed correctly.
+
+Check if **api-resources** for chaos are available: 
+
+
+```
+root@demo:~# kubectl api-resources | grep litmus
+chaosengines                                   litmuschaos.io                 true         ChaosEngine
+chaosexperiments                               litmuschaos.io                 true         ChaosExperiment
+chaosresults                                   litmuschaos.io                 true         ChaosResult
+```
+
+Check if the Litmus chaos operator deployment is running successfully:
+
+
+```
+root@demo:~# kubectl get pods -n litmus
+NAME                      READY   STATUS    RESTARTS   AGE
+litmus-7d998b6568-nnlcd   1/1     Running   0          106s
+```
+
+### Start running chaos experiments 
+
+With this out of the way, you are good to go! Refer to Litmus' [chaos experiment documentation][7] to start executing your first experiment.
+
+To confirm your installation is working, check that the pod is up and running correctly:
+
+
+```
+jess@Athena:~$ kubectl get pods -n litmus
+NAME                      READY   STATUS    RESTARTS   AGE
+litmus-7d6f994d88-2g7wn   1/1     Running   0          115s
+```
+
+Confirm the Custom Resource Definitions (CRDs) are also installed correctly:
+
+
+```
+jess@Athena:~$ kubectl get crds | grep chaos
+chaosengines.litmuschaos.io       2021-05-09T21:05:33Z
+chaosexperiments.litmuschaos.io   2021-05-09T21:05:33Z
+chaosresults.litmuschaos.io       2021-05-09T21:05:33Z
+```
+
+Finally, confirm your API resources are also installed:
+
+
+```
+jess@Athena:~$ kubectl api-resources | grep chaos
+chaosengines                                   litmuschaos.io                 true         ChaosEngine
+chaosexperiments                               litmuschaos.io                 true         ChaosExperiment
+chaosresults                                   litmuschaos.io                 true         ChaosResult
+```
+
+That's what I call easy installation and confirmation. The next step is setting up deployments for chaos.
+
+### Prep for destruction
+
+To test for chaos, you need something to test against. Add a new namespace:
+
+
+```
+$ kubectl create namespace more-apps
+namespace/more-apps created
+```
+
+Then add a deployment to the new namespace:
+
+
+```
+$ kubectl create deployment ghost --namespace more-apps --image=ghost:3.11.0-alpine
+deployment.apps/ghost created
+```
+
+Finally, scale your deployment up so that you have more than one pod in your deployment to test against:
+
+
+```
+$ kubectl scale deployment/ghost --namespace more-apps --replicas=4
+deployment.apps/ghost scaled
+```
+
+For Litmus to cause chaos, you need to add an [annotation][8] to your deployment to mark it ready for chaos. Currently, annotations are available for deployments, StatefulSets, and DaemonSets. Add the annotation `chaos=true` to your deployment:
+
+
+```
+$ kubectl annotate deploy/ghost litmuschaos.io/chaos="true" -n more-apps
+deployment.apps/ghost annotated
+```
+
+Make sure the experiments you will install have the correct permissions to work in the "more-apps" namespace.
+
+Make a new **rbac.yaml** file for the prepper bindings and permissions:
+
+
+```
+`$ touch rbac.yaml`
+```
+
+Then add permissions for the generic testing by copying and pasting the code below into your **rbac.yaml** file. These are just basic, minimal permissions to kill pods in your namespace and give Litmus permissions to delete a pod for a namespace you provide:
+
+
+```
+\---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: pod-delete-sa
+  namespace: more-apps
+  labels:
+    name: pod-delete-sa
+\---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: pod-delete-sa
+  namespace: more-apps
+  labels:
+    name: pod-delete-sa
+rules:
+\- apiGroups: [""]
+  resources: ["pods","events"]
+  verbs: ["create","list","get","patch","update","delete","deletecollection"]
+\- apiGroups: [""]
+  resources: ["pods/exec","pods/log","replicationcontrollers"]
+  verbs: ["create","list","get"]
+\- apiGroups: ["batch"]
+  resources: ["jobs"]
+  verbs: ["create","list","get","delete","deletecollection"]
+\- apiGroups: ["apps"]
+  resources: ["deployments","statefulsets","daemonsets","replicasets"]
+  verbs: ["list","get"]
+\- apiGroups: ["apps.openshift.io"]
+  resources: ["deploymentconfigs"]
+  verbs: ["list","get"]
+\- apiGroups: ["argoproj.io"]
+  resources: ["rollouts"]
+  verbs: ["list","get"]
+\- apiGroups: ["litmuschaos.io"]
+  resources: ["chaosengines","chaosexperiments","chaosresults"]
+  verbs: ["create","list","get","patch","update"]
+\---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: pod-delete-sa
+  namespace: more-apps
+  labels:
+    name: pod-delete-sa
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: pod-delete-sa
+subjects:
+\- kind: ServiceAccount
+  name: pod-delete-sa
+  namespace: more-apps
+```
+
+Apply the **rbac.yaml** file:
+
+
+```
+$ kubectl apply -f rbac.yaml
+serviceaccount/pod-delete-sa created
+role.rbac.authorization.k8s.io/pod-delete-sa created
+rolebinding.rbac.authorization.k8s.io/pod-delete-sa created
+```
+
+The next step is to prepare your chaos engine to delete pods. The chaos engine will connect the experiment you need to your application instance by creating a **chaosengine.yaml** file and copying the information below into the .yaml file. This will connect your experiment to your namespace and the service account with the role bindings you created above.
+
+This chaos engine file only specifies the pod to delete during chaos testing:
+
+
+```
+apiVersion: litmuschaos.io/v1alpha1
+kind: ChaosEngine
+metadata:
+  name: moreapps-chaos
+  namespace: more-apps
+spec:
+  appinfo:
+    appns: 'more-apps'
+    applabel: 'app=ghost'
+    appkind: 'deployment'
+  # It can be true/false
+  annotationCheck: 'true'
+  # It can be active/stop
+  engineState: 'active'
+  #ex. values: ns1:name=percona,ns2:run=more-apps
+  auxiliaryAppInfo: ''
+  chaosServiceAccount: pod-delete-sa
+  # It can be delete/retain
+  jobCleanUpPolicy: 'delete'
+  experiments:
+    - name: pod-delete
+      spec:
+        components:
+          env:
+           # set chaos duration (in sec) as desired
+            - name: TOTAL_CHAOS_DURATION
+              value: '30'
+
+            # set chaos interval (in sec) as desired
+            - name: CHAOS_INTERVAL
+              value: '10'
+
+            # pod failures without '--force' &amp; default terminationGracePeriodSeconds
+            - name: FORCE
+              value: 'false'
+```
+
+Don't apply this file until you install the experiments in the next section.
+
+### Add new experiments for causing chaos
+
+Now that you have an entirely new environment with deployments, roles, and the chaos engine to test against, you need some experiments to run. Since Litmus has a large community, you can find some great experiments in the [Chaos Hub][9].
+
+In this walkthrough, I'll use the generic experiment of [killing a pod][10].
+
+Run a kubectl command to install the generic experiments into your cluster. Install this in your `more-apps` namespace; you will see the tests created when you run it:
+
+
+```
+$ kubectl apply -f <https://hub.litmuschaos.io/api/chaos/1.13.3?file=charts/generic/experiments.yaml> -n more-apps
+chaosexperiment.litmuschaos.io/pod-network-duplication created
+chaosexperiment.litmuschaos.io/node-cpu-hog created
+chaosexperiment.litmuschaos.io/node-drain created
+chaosexperiment.litmuschaos.io/docker-service-kill created
+chaosexperiment.litmuschaos.io/node-taint created
+chaosexperiment.litmuschaos.io/pod-autoscaler created
+chaosexperiment.litmuschaos.io/pod-network-loss created
+chaosexperiment.litmuschaos.io/node-memory-hog created
+chaosexperiment.litmuschaos.io/disk-loss created
+chaosexperiment.litmuschaos.io/pod-io-stress created
+chaosexperiment.litmuschaos.io/pod-network-corruption created
+chaosexperiment.litmuschaos.io/container-kill created
+chaosexperiment.litmuschaos.io/node-restart created
+chaosexperiment.litmuschaos.io/node-io-stress created
+chaosexperiment.litmuschaos.io/disk-fill created
+chaosexperiment.litmuschaos.io/pod-cpu-hog created
+chaosexperiment.litmuschaos.io/pod-network-latency created
+chaosexperiment.litmuschaos.io/kubelet-service-kill created
+chaosexperiment.litmuschaos.io/k8-pod-delete created
+chaosexperiment.litmuschaos.io/pod-delete created
+chaosexperiment.litmuschaos.io/node-poweroff created
+chaosexperiment.litmuschaos.io/k8-service-kill created
+chaosexperiment.litmuschaos.io/pod-memory-hog created
+```
+
+Verify the experiments installed correctly:
+
+
+```
+$ kubectl get chaosexperiments -n more-apps
+NAME                      AGE
+container-kill            72s
+disk-fill                 72s
+disk-loss                 72s
+docker-service-kill       72s
+k8-pod-delete             72s
+k8-service-kill           72s
+kubelet-service-kill      72s
+node-cpu-hog              72s
+node-drain                72s
+node-io-stress            72s
+node-memory-hog           72s
+node-poweroff             72s
+node-restart              72s
+node-taint                72s
+pod-autoscaler            72s
+pod-cpu-hog               72s
+pod-delete                72s
+pod-io-stress             72s
+pod-memory-hog            72s
+pod-network-corruption    72s
+pod-network-duplication   72s
+pod-network-latency       72s
+pod-network-loss          72s
+```
+
+### Run the experiments
+
+Now that everything is installed and configured, use your **chaosengine.yaml** file to run the pod-deletion experiment you defined. Apply your chaos engine file:
+
+
+```
+$ kubectl apply -f chaosengine.yaml
+chaosengine.litmuschaos.io/more-apps-chaos created
+```
+
+Confirm the engine started by getting all the pods in your namespace; you should see `pod-delete` being created:
+
+
+```
+$ kubectl get pods -n more-apps
+NAME                      READY   STATUS              RESTARTS   AGE
+ghost-5bdd4cdcc4-blmtl    1/1     Running             0          53m
+ghost-5bdd4cdcc4-z2lnt    1/1     Running             0          53m
+ghost-5bdd4cdcc4-zlcc9    1/1     Running             0          53m
+ghost-5bdd4cdcc4-zrs8f    1/1     Running             0          53m
+moreapps-chaos-runner     1/1     Running             0          17s
+pod-delete-e443qx-lxzfx   0/1     ContainerCreating   0          7s
+```
+
+Next, you need to be able to observe your experiments using Litmus. The following command uses the ChaosResult CRD and provides a large amount of output:
+
+
+```
+$ kubectl describe chaosresult moreapps-chaos-pod-delete -n more-apps
+Name:         moreapps-chaos-pod-delete
+Namespace:    more-apps
+Labels:       app.kubernetes.io/component=experiment-job
+              app.kubernetes.io/part-of=litmus
+              app.kubernetes.io/version=1.13.3
+              chaosUID=a6c9ab7e-ff07-4703-abe4-43e03b77bd72
+              controller-uid=601b7330-c6f3-4d9b-90cb-2c761ac0567a
+              job-name=pod-delete-e443qx
+              name=moreapps-chaos-pod-delete
+Annotations:  &lt;none&gt;
+API Version:  litmuschaos.io/v1alpha1
+Kind:         ChaosResult
+Metadata:
+  Creation Timestamp:  2021-05-09T22:06:19Z
+  Generation:          2
+  Managed Fields:
+    API Version:  litmuschaos.io/v1alpha1
+    Fields Type:  FieldsV1
+    fieldsV1:
+      f:metadata:
+        f:labels:
+          .:
+          f:app.kubernetes.io/component:
+          f:app.kubernetes.io/part-of:
+          f:app.kubernetes.io/version:
+          f:chaosUID:
+          f:controller-uid:
+          f:job-name:
+          f:name:
+      f:spec:
+        .:
+        f:engine:
+        f:experiment:
+      f:status:
+        .:
+        f:experimentStatus:
+        f:history:
+    Manager:         experiments
+    Operation:       Update
+    Time:            2021-05-09T22:06:53Z
+  Resource Version:  8406
+  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/more-apps/chaosresults/moreapps-chaos-pod-delete
+  UID:               08b7e3da-d603-49c7-bac4-3b54eb30aff8
+Spec:
+  Engine:      moreapps-chaos
+  Experiment:  pod-delete
+Status:
+  Experiment Status:
+    Fail Step:                 N/A
+    Phase:                     Completed
+    Probe Success Percentage:  100
+    Verdict:                   Pass
+  History:
+    Failed Runs:   0
+    Passed Runs:   1
+    Stopped Runs:  0
+Events:
+  Type    Reason   Age    From                     Message
+  ----    ------   ----   ----                     -------
+  Normal  Pass     104s   pod-delete-e443qx-lxzfx  experiment: pod-delete, Result: Pass
+```
+
+You can see the pass or fail output from your testing as you run the chaos engine definitions.
+
+Congratulations on your first (and hopefully not last) chaos engineering test! Now you have a powerful tool to use and help your environment grow.
+
+### Final thoughts
+
+You might be thinking, "I can't run this manually every time I want to run chaos. How far can I take this, and how can I set it up for the long term?"
+
+Litmus' best part (aside from the Chaos Hub) is its [scheduler][11] function. You can use it to define times and dates, repetitions or sporadic, to run experiments. This is a great tool for detailed admins who have been working with Kubernetes for a while and are ready to create some chaos. I suggest staying up to date on Litmus and how to use this tool for regular chaos engineering. Happy pod hunting!
+
+--------------------------------------------------------------------------------
+
+via: https://opensource.com/article/21/6/kubernetes-litmus-chaos
+
+作者：[Jessica Cherry][a]
+选题：[lujun9972][b]
+译者：[译者ID](https://github.com/译者ID)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://opensource.com/users/cherrybomb
+[b]: https://github.com/lujun9972
+[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/science_experiment_beaker_lab.png?itok=plKWRhlU (Science lab with beakers)
+[2]: https://github.com/litmuschaos/litmus
+[3]: https://opensource.com/article/21/5/11-years-kubernetes-and-chaos
+[4]: https://opensource.com/article/21/5/get-your-steady-state-chaos-grafana-and-prometheus
+[5]: https://minikube.sigs.k8s.io/docs/start/
+[6]: https://litmuschaos.io/
+[7]: https://docs.litmuschaos.io
+[8]: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
+[9]: https://hub.litmuschaos.io/
+[10]: https://docs.litmuschaos.io/docs/pod-delete/
+[11]: https://docs.litmuschaos.io/docs/scheduling/