From 37282ac33085281d26170e3caad1bc4926cc4c70 Mon Sep 17 00:00:00 2001 From: DarkSun Date: Thu, 3 Jun 2021 05:03:31 +0800 Subject: [PATCH] =?UTF-8?q?=E9=80=89=E9=A2=98[tech]:=2020210602=20Test=20K?= =?UTF-8?q?ubernetes=20cluster=20failures=20and=20experiments=20in=20your?= =?UTF-8?q?=20terminal?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md --- ...ilures and experiments in your terminal.md | 486 ++++++++++++++++++ 1 file changed, 486 insertions(+) create mode 100644 sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md diff --git a/sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md b/sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md new file mode 100644 index 0000000000..347497b121 --- /dev/null +++ b/sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md @@ -0,0 +1,486 @@ +[#]: subject: (Test Kubernetes cluster failures and experiments in your terminal) +[#]: via: (https://opensource.com/article/21/6/kubernetes-litmus-chaos) +[#]: author: (Jessica Cherry https://opensource.com/users/cherrybomb) +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) + +Test Kubernetes cluster failures and experiments in your terminal +====== +Litmus is an effective tool to cause chaos to test how your system will +respond to failure. +![Science lab with beakers][1] + +Do you know how your system will respond to an arbitrary failure? Will your application fail? Will anything survive after a loss? If you're not sure, it's time to see if your system passes the [Litmus][2] test, a detailed way to cause chaos at random with many experiments. + +In the first article in this series, I explained [what chaos engineering is][3], and in the second article, I demonstrated how to get your [system's steady state][4] so that you can compare it against a chaos state. This third article will show you how to install and use Litmus to test arbitrary failures and experiments in your Kubernetes cluster. In this walkthrough, I'll use Pop!_OS 20.04, Helm 3, Minikube 1.14.2, and Kubernetes 1.19. + +### Configure Minikube + +If you haven't already, [install Minikube][5] in whatever way makes sense for your environment. If you have enough resources, I recommend giving your virtual machine a bit more than the default memory and CPU power: + + +``` +$ minikube config set memory 8192 +❗  These changes will take effect upon a minikube delete and then a minikube start +$ minikube config set cpus 6 +❗  These changes will take effect upon a minikube delete and then a minikube start +``` + +Then start and check your system's status: + + +``` +$ minikube start +😄  minikube v1.14.2 on Debian bullseye/sid +🎉  minikube 1.19.0 is available! Download it: +💡  To disable this notice, run: 'minikube config set WantUpdateNotification false' + +✨  Using the docker driver based on user configuration +👍  Starting control plane node minikube in cluster minikube +🔥  Creating docker container (CPUs=6, Memory=8192MB) ... +🐳  Preparing Kubernetes v1.19.0 on Docker 19.03.8 ... +🔎  Verifying Kubernetes components... +🌟  Enabled addons: storage-provisioner, default-storageclass +🏄  Done! kubectl is now configured to use "minikube" by default +jess@Athena:~$ minikube status +minikube +type: Control Plane +host: Running +kubelet: Running +apiserver: Running +kubeconfig: Configured +``` + +### Install Litmus + +As outlined on [Litmus' homepage][6], the steps to install Litmus are: add your repo to Helm, create your Litmus namespace, then install your chart: + + +``` +$ helm repo add litmuschaos +"litmuschaos" has been added to your repositories + +$ kubectl create ns litmus +namespace/litmus created + +$ helm install chaos litmuschaos/litmus --namespace=litmus +NAME: chaos +LAST DEPLOYED: Sun May  9 17:05:36 2021 +NAMESPACE: litmus +STATUS: deployed +REVISION: 1 +TEST SUITE: None +NOTES: +``` + +### Verify the installation + +You can run the following commands if you want to verify all the desired components are installed correctly. + +Check if **api-resources** for chaos are available:  + + +``` +root@demo:~# kubectl api-resources | grep litmus +chaosengines                                   litmuschaos.io                 true         ChaosEngine +chaosexperiments                               litmuschaos.io                 true         ChaosExperiment +chaosresults                                   litmuschaos.io                 true         ChaosResult +``` + +Check if the Litmus chaos operator deployment is running successfully: + + +``` +root@demo:~# kubectl get pods -n litmus +NAME                      READY   STATUS    RESTARTS   AGE +litmus-7d998b6568-nnlcd   1/1     Running   0          106s +``` + +### Start running chaos experiments  + +With this out of the way, you are good to go! Refer to Litmus' [chaos experiment documentation][7] to start executing your first experiment. + +To confirm your installation is working, check that the pod is up and running correctly: + + +``` +jess@Athena:~$ kubectl get pods -n litmus +NAME                      READY   STATUS    RESTARTS   AGE +litmus-7d6f994d88-2g7wn   1/1     Running   0          115s +``` + +Confirm the Custom Resource Definitions (CRDs) are also installed correctly: + + +``` +jess@Athena:~$ kubectl get crds | grep chaos +chaosengines.litmuschaos.io       2021-05-09T21:05:33Z +chaosexperiments.litmuschaos.io   2021-05-09T21:05:33Z +chaosresults.litmuschaos.io       2021-05-09T21:05:33Z +``` + +Finally, confirm your API resources are also installed: + + +``` +jess@Athena:~$ kubectl api-resources | grep chaos +chaosengines                                   litmuschaos.io                 true         ChaosEngine +chaosexperiments                               litmuschaos.io                 true         ChaosExperiment +chaosresults                                   litmuschaos.io                 true         ChaosResult +``` + +That's what I call easy installation and confirmation. The next step is setting up deployments for chaos. + +### Prep for destruction + +To test for chaos, you need something to test against. Add a new namespace: + + +``` +$ kubectl create namespace more-apps +namespace/more-apps created +``` + +Then add a deployment to the new namespace: + + +``` +$ kubectl create deployment ghost --namespace more-apps --image=ghost:3.11.0-alpine +deployment.apps/ghost created +``` + +Finally, scale your deployment up so that you have more than one pod in your deployment to test against: + + +``` +$ kubectl scale deployment/ghost --namespace more-apps --replicas=4 +deployment.apps/ghost scaled +``` + +For Litmus to cause chaos, you need to add an [annotation][8] to your deployment to mark it ready for chaos. Currently, annotations are available for deployments, StatefulSets, and DaemonSets. Add the annotation `chaos=true` to your deployment: + + +``` +$ kubectl annotate deploy/ghost litmuschaos.io/chaos="true" -n more-apps +deployment.apps/ghost annotated +``` + +Make sure the experiments you will install have the correct permissions to work in the "more-apps" namespace. + +Make a new **rbac.yaml** file for the prepper bindings and permissions: + + +``` +`$ touch rbac.yaml` +``` + +Then add permissions for the generic testing by copying and pasting the code below into your **rbac.yaml** file. These are just basic, minimal permissions to kill pods in your namespace and give Litmus permissions to delete a pod for a namespace you provide: + + +``` +\--- +apiVersion: v1 +kind: ServiceAccount +metadata: +  name: pod-delete-sa +  namespace: more-apps +  labels: +    name: pod-delete-sa +\--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: +  name: pod-delete-sa +  namespace: more-apps +  labels: +    name: pod-delete-sa +rules: +\- apiGroups: [""] +  resources: ["pods","events"] +  verbs: ["create","list","get","patch","update","delete","deletecollection"] +\- apiGroups: [""] +  resources: ["pods/exec","pods/log","replicationcontrollers"] +  verbs: ["create","list","get"] +\- apiGroups: ["batch"] +  resources: ["jobs"] +  verbs: ["create","list","get","delete","deletecollection"] +\- apiGroups: ["apps"] +  resources: ["deployments","statefulsets","daemonsets","replicasets"] +  verbs: ["list","get"] +\- apiGroups: ["apps.openshift.io"] +  resources: ["deploymentconfigs"] +  verbs: ["list","get"] +\- apiGroups: ["argoproj.io"] +  resources: ["rollouts"] +  verbs: ["list","get"] +\- apiGroups: ["litmuschaos.io"] +  resources: ["chaosengines","chaosexperiments","chaosresults"] +  verbs: ["create","list","get","patch","update"] +\--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: +  name: pod-delete-sa +  namespace: more-apps +  labels: +    name: pod-delete-sa +roleRef: +  apiGroup: rbac.authorization.k8s.io +  kind: Role +  name: pod-delete-sa +subjects: +\- kind: ServiceAccount +  name: pod-delete-sa +  namespace: more-apps +``` + +Apply the **rbac.yaml** file: + + +``` +$ kubectl apply -f rbac.yaml +serviceaccount/pod-delete-sa created +role.rbac.authorization.k8s.io/pod-delete-sa created +rolebinding.rbac.authorization.k8s.io/pod-delete-sa created +``` + +The next step is to prepare your chaos engine to delete pods. The chaos engine will connect the experiment you need to your application instance by creating a **chaosengine.yaml** file and copying the information below into the .yaml file. This will connect your experiment to your namespace and the service account with the role bindings you created above. + +This chaos engine file only specifies the pod to delete during chaos testing: + + +``` +apiVersion: litmuschaos.io/v1alpha1 +kind: ChaosEngine +metadata: +  name: moreapps-chaos +  namespace: more-apps +spec: +  appinfo: +    appns: 'more-apps' +    applabel: 'app=ghost' +    appkind: 'deployment' +  # It can be true/false +  annotationCheck: 'true' +  # It can be active/stop +  engineState: 'active' +  #ex. values: ns1:name=percona,ns2:run=more-apps +  auxiliaryAppInfo: '' +  chaosServiceAccount: pod-delete-sa +  # It can be delete/retain +  jobCleanUpPolicy: 'delete' +  experiments: +    - name: pod-delete +      spec: +        components: +          env: +           # set chaos duration (in sec) as desired +            - name: TOTAL_CHAOS_DURATION +              value: '30' + +            # set chaos interval (in sec) as desired +            - name: CHAOS_INTERVAL +              value: '10' + +            # pod failures without '--force' & default terminationGracePeriodSeconds +            - name: FORCE +              value: 'false' +``` + +Don't apply this file until you install the experiments in the next section. + +### Add new experiments for causing chaos + +Now that you have an entirely new environment with deployments, roles, and the chaos engine to test against, you need some experiments to run. Since Litmus has a large community, you can find some great experiments in the [Chaos Hub][9]. + +In this walkthrough, I'll use the generic experiment of [killing a pod][10]. + +Run a kubectl command to install the generic experiments into your cluster. Install this in your `more-apps` namespace; you will see the tests created when you run it: + + +``` +$ kubectl apply -f -n more-apps +chaosexperiment.litmuschaos.io/pod-network-duplication created +chaosexperiment.litmuschaos.io/node-cpu-hog created +chaosexperiment.litmuschaos.io/node-drain created +chaosexperiment.litmuschaos.io/docker-service-kill created +chaosexperiment.litmuschaos.io/node-taint created +chaosexperiment.litmuschaos.io/pod-autoscaler created +chaosexperiment.litmuschaos.io/pod-network-loss created +chaosexperiment.litmuschaos.io/node-memory-hog created +chaosexperiment.litmuschaos.io/disk-loss created +chaosexperiment.litmuschaos.io/pod-io-stress created +chaosexperiment.litmuschaos.io/pod-network-corruption created +chaosexperiment.litmuschaos.io/container-kill created +chaosexperiment.litmuschaos.io/node-restart created +chaosexperiment.litmuschaos.io/node-io-stress created +chaosexperiment.litmuschaos.io/disk-fill created +chaosexperiment.litmuschaos.io/pod-cpu-hog created +chaosexperiment.litmuschaos.io/pod-network-latency created +chaosexperiment.litmuschaos.io/kubelet-service-kill created +chaosexperiment.litmuschaos.io/k8-pod-delete created +chaosexperiment.litmuschaos.io/pod-delete created +chaosexperiment.litmuschaos.io/node-poweroff created +chaosexperiment.litmuschaos.io/k8-service-kill created +chaosexperiment.litmuschaos.io/pod-memory-hog created +``` + +Verify the experiments installed correctly: + + +``` +$ kubectl get chaosexperiments -n more-apps +NAME                      AGE +container-kill            72s +disk-fill                 72s +disk-loss                 72s +docker-service-kill       72s +k8-pod-delete             72s +k8-service-kill           72s +kubelet-service-kill      72s +node-cpu-hog              72s +node-drain                72s +node-io-stress            72s +node-memory-hog           72s +node-poweroff             72s +node-restart              72s +node-taint                72s +pod-autoscaler            72s +pod-cpu-hog               72s +pod-delete                72s +pod-io-stress             72s +pod-memory-hog            72s +pod-network-corruption    72s +pod-network-duplication   72s +pod-network-latency       72s +pod-network-loss          72s +``` + +### Run the experiments + +Now that everything is installed and configured, use your **chaosengine.yaml** file to run the pod-deletion experiment you defined. Apply your chaos engine file: + + +``` +$ kubectl apply -f chaosengine.yaml +chaosengine.litmuschaos.io/more-apps-chaos created +``` + +Confirm the engine started by getting all the pods in your namespace; you should see `pod-delete` being created: + + +``` +$ kubectl get pods -n more-apps +NAME                      READY   STATUS              RESTARTS   AGE +ghost-5bdd4cdcc4-blmtl    1/1     Running             0          53m +ghost-5bdd4cdcc4-z2lnt    1/1     Running             0          53m +ghost-5bdd4cdcc4-zlcc9    1/1     Running             0          53m +ghost-5bdd4cdcc4-zrs8f    1/1     Running             0          53m +moreapps-chaos-runner     1/1     Running             0          17s +pod-delete-e443qx-lxzfx   0/1     ContainerCreating   0          7s +``` + +Next, you need to be able to observe your experiments using Litmus. The following command uses the ChaosResult CRD and provides a large amount of output: + + +``` +$ kubectl describe chaosresult moreapps-chaos-pod-delete -n more-apps +Name:         moreapps-chaos-pod-delete +Namespace:    more-apps +Labels:       app.kubernetes.io/component=experiment-job +              app.kubernetes.io/part-of=litmus +              app.kubernetes.io/version=1.13.3 +              chaosUID=a6c9ab7e-ff07-4703-abe4-43e03b77bd72 +              controller-uid=601b7330-c6f3-4d9b-90cb-2c761ac0567a +              job-name=pod-delete-e443qx +              name=moreapps-chaos-pod-delete +Annotations:  <none> +API Version:  litmuschaos.io/v1alpha1 +Kind:         ChaosResult +Metadata: +  Creation Timestamp:  2021-05-09T22:06:19Z +  Generation:          2 +  Managed Fields: +    API Version:  litmuschaos.io/v1alpha1 +    Fields Type:  FieldsV1 +    fieldsV1: +      f:metadata: +        f:labels: +          .: +          f:app.kubernetes.io/component: +          f:app.kubernetes.io/part-of: +          f:app.kubernetes.io/version: +          f:chaosUID: +          f:controller-uid: +          f:job-name: +          f:name: +      f:spec: +        .: +        f:engine: +        f:experiment: +      f:status: +        .: +        f:experimentStatus: +        f:history: +    Manager:         experiments +    Operation:       Update +    Time:            2021-05-09T22:06:53Z +  Resource Version:  8406 +  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/more-apps/chaosresults/moreapps-chaos-pod-delete +  UID:               08b7e3da-d603-49c7-bac4-3b54eb30aff8 +Spec: +  Engine:      moreapps-chaos +  Experiment:  pod-delete +Status: +  Experiment Status: +    Fail Step:                 N/A +    Phase:                     Completed +    Probe Success Percentage:  100 +    Verdict:                   Pass +  History: +    Failed Runs:   0 +    Passed Runs:   1 +    Stopped Runs:  0 +Events: +  Type    Reason   Age    From                     Message +  ----    ------   ----   ----                     ------- +  Normal  Pass     104s   pod-delete-e443qx-lxzfx  experiment: pod-delete, Result: Pass +``` + +You can see the pass or fail output from your testing as you run the chaos engine definitions. + +Congratulations on your first (and hopefully not last) chaos engineering test! Now you have a powerful tool to use and help your environment grow. + +### Final thoughts + +You might be thinking, "I can't run this manually every time I want to run chaos. How far can I take this, and how can I set it up for the long term?" + +Litmus' best part (aside from the Chaos Hub) is its [scheduler][11] function. You can use it to define times and dates, repetitions or sporadic, to run experiments. This is a great tool for detailed admins who have been working with Kubernetes for a while and are ready to create some chaos. I suggest staying up to date on Litmus and how to use this tool for regular chaos engineering. Happy pod hunting! + +-------------------------------------------------------------------------------- + +via: https://opensource.com/article/21/6/kubernetes-litmus-chaos + +作者:[Jessica Cherry][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://opensource.com/users/cherrybomb +[b]: https://github.com/lujun9972 +[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/science_experiment_beaker_lab.png?itok=plKWRhlU (Science lab with beakers) +[2]: https://github.com/litmuschaos/litmus +[3]: https://opensource.com/article/21/5/11-years-kubernetes-and-chaos +[4]: https://opensource.com/article/21/5/get-your-steady-state-chaos-grafana-and-prometheus +[5]: https://minikube.sigs.k8s.io/docs/start/ +[6]: https://litmuschaos.io/ +[7]: https://docs.litmuschaos.io +[8]: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/ +[9]: https://hub.litmuschaos.io/ +[10]: https://docs.litmuschaos.io/docs/pod-delete/ +[11]: https://docs.litmuschaos.io/docs/scheduling/