mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-16 22:42:21 +08:00
37282ac330
sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md
487 lines
18 KiB
Markdown
487 lines
18 KiB
Markdown
[#]: subject: (Test Kubernetes cluster failures and experiments in your terminal)
|
||
[#]: via: (https://opensource.com/article/21/6/kubernetes-litmus-chaos)
|
||
[#]: author: (Jessica Cherry https://opensource.com/users/cherrybomb)
|
||
[#]: collector: (lujun9972)
|
||
[#]: translator: ( )
|
||
[#]: reviewer: ( )
|
||
[#]: publisher: ( )
|
||
[#]: url: ( )
|
||
|
||
Test Kubernetes cluster failures and experiments in your terminal
|
||
======
|
||
Litmus is an effective tool to cause chaos to test how your system will
|
||
respond to failure.
|
||
![Science lab with beakers][1]
|
||
|
||
Do you know how your system will respond to an arbitrary failure? Will your application fail? Will anything survive after a loss? If you're not sure, it's time to see if your system passes the [Litmus][2] test, a detailed way to cause chaos at random with many experiments.
|
||
|
||
In the first article in this series, I explained [what chaos engineering is][3], and in the second article, I demonstrated how to get your [system's steady state][4] so that you can compare it against a chaos state. This third article will show you how to install and use Litmus to test arbitrary failures and experiments in your Kubernetes cluster. In this walkthrough, I'll use Pop!_OS 20.04, Helm 3, Minikube 1.14.2, and Kubernetes 1.19.
|
||
|
||
### Configure Minikube
|
||
|
||
If you haven't already, [install Minikube][5] in whatever way makes sense for your environment. If you have enough resources, I recommend giving your virtual machine a bit more than the default memory and CPU power:
|
||
|
||
|
||
```
|
||
$ minikube config set memory 8192
|
||
❗ These changes will take effect upon a minikube delete and then a minikube start
|
||
$ minikube config set cpus 6
|
||
❗ These changes will take effect upon a minikube delete and then a minikube start
|
||
```
|
||
|
||
Then start and check your system's status:
|
||
|
||
|
||
```
|
||
$ minikube start
|
||
😄 minikube v1.14.2 on Debian bullseye/sid
|
||
🎉 minikube 1.19.0 is available! Download it: <https://github.com/kubernetes/minikube/releases/tag/v1.19.0>
|
||
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
|
||
|
||
✨ Using the docker driver based on user configuration
|
||
👍 Starting control plane node minikube in cluster minikube
|
||
🔥 Creating docker container (CPUs=6, Memory=8192MB) ...
|
||
🐳 Preparing Kubernetes v1.19.0 on Docker 19.03.8 ...
|
||
🔎 Verifying Kubernetes components...
|
||
🌟 Enabled addons: storage-provisioner, default-storageclass
|
||
🏄 Done! kubectl is now configured to use "minikube" by default
|
||
jess@Athena:~$ minikube status
|
||
minikube
|
||
type: Control Plane
|
||
host: Running
|
||
kubelet: Running
|
||
apiserver: Running
|
||
kubeconfig: Configured
|
||
```
|
||
|
||
### Install Litmus
|
||
|
||
As outlined on [Litmus' homepage][6], the steps to install Litmus are: add your repo to Helm, create your Litmus namespace, then install your chart:
|
||
|
||
|
||
```
|
||
$ helm repo add litmuschaos <https://litmuschaos.github.io/litmus-helm/>
|
||
"litmuschaos" has been added to your repositories
|
||
|
||
$ kubectl create ns litmus
|
||
namespace/litmus created
|
||
|
||
$ helm install chaos litmuschaos/litmus --namespace=litmus
|
||
NAME: chaos
|
||
LAST DEPLOYED: Sun May 9 17:05:36 2021
|
||
NAMESPACE: litmus
|
||
STATUS: deployed
|
||
REVISION: 1
|
||
TEST SUITE: None
|
||
NOTES:
|
||
```
|
||
|
||
### Verify the installation
|
||
|
||
You can run the following commands if you want to verify all the desired components are installed correctly.
|
||
|
||
Check if **api-resources** for chaos are available:
|
||
|
||
|
||
```
|
||
root@demo:~# kubectl api-resources | grep litmus
|
||
chaosengines litmuschaos.io true ChaosEngine
|
||
chaosexperiments litmuschaos.io true ChaosExperiment
|
||
chaosresults litmuschaos.io true ChaosResult
|
||
```
|
||
|
||
Check if the Litmus chaos operator deployment is running successfully:
|
||
|
||
|
||
```
|
||
root@demo:~# kubectl get pods -n litmus
|
||
NAME READY STATUS RESTARTS AGE
|
||
litmus-7d998b6568-nnlcd 1/1 Running 0 106s
|
||
```
|
||
|
||
### Start running chaos experiments
|
||
|
||
With this out of the way, you are good to go! Refer to Litmus' [chaos experiment documentation][7] to start executing your first experiment.
|
||
|
||
To confirm your installation is working, check that the pod is up and running correctly:
|
||
|
||
|
||
```
|
||
jess@Athena:~$ kubectl get pods -n litmus
|
||
NAME READY STATUS RESTARTS AGE
|
||
litmus-7d6f994d88-2g7wn 1/1 Running 0 115s
|
||
```
|
||
|
||
Confirm the Custom Resource Definitions (CRDs) are also installed correctly:
|
||
|
||
|
||
```
|
||
jess@Athena:~$ kubectl get crds | grep chaos
|
||
chaosengines.litmuschaos.io 2021-05-09T21:05:33Z
|
||
chaosexperiments.litmuschaos.io 2021-05-09T21:05:33Z
|
||
chaosresults.litmuschaos.io 2021-05-09T21:05:33Z
|
||
```
|
||
|
||
Finally, confirm your API resources are also installed:
|
||
|
||
|
||
```
|
||
jess@Athena:~$ kubectl api-resources | grep chaos
|
||
chaosengines litmuschaos.io true ChaosEngine
|
||
chaosexperiments litmuschaos.io true ChaosExperiment
|
||
chaosresults litmuschaos.io true ChaosResult
|
||
```
|
||
|
||
That's what I call easy installation and confirmation. The next step is setting up deployments for chaos.
|
||
|
||
### Prep for destruction
|
||
|
||
To test for chaos, you need something to test against. Add a new namespace:
|
||
|
||
|
||
```
|
||
$ kubectl create namespace more-apps
|
||
namespace/more-apps created
|
||
```
|
||
|
||
Then add a deployment to the new namespace:
|
||
|
||
|
||
```
|
||
$ kubectl create deployment ghost --namespace more-apps --image=ghost:3.11.0-alpine
|
||
deployment.apps/ghost created
|
||
```
|
||
|
||
Finally, scale your deployment up so that you have more than one pod in your deployment to test against:
|
||
|
||
|
||
```
|
||
$ kubectl scale deployment/ghost --namespace more-apps --replicas=4
|
||
deployment.apps/ghost scaled
|
||
```
|
||
|
||
For Litmus to cause chaos, you need to add an [annotation][8] to your deployment to mark it ready for chaos. Currently, annotations are available for deployments, StatefulSets, and DaemonSets. Add the annotation `chaos=true` to your deployment:
|
||
|
||
|
||
```
|
||
$ kubectl annotate deploy/ghost litmuschaos.io/chaos="true" -n more-apps
|
||
deployment.apps/ghost annotated
|
||
```
|
||
|
||
Make sure the experiments you will install have the correct permissions to work in the "more-apps" namespace.
|
||
|
||
Make a new **rbac.yaml** file for the prepper bindings and permissions:
|
||
|
||
|
||
```
|
||
`$ touch rbac.yaml`
|
||
```
|
||
|
||
Then add permissions for the generic testing by copying and pasting the code below into your **rbac.yaml** file. These are just basic, minimal permissions to kill pods in your namespace and give Litmus permissions to delete a pod for a namespace you provide:
|
||
|
||
|
||
```
|
||
\---
|
||
apiVersion: v1
|
||
kind: ServiceAccount
|
||
metadata:
|
||
name: pod-delete-sa
|
||
namespace: more-apps
|
||
labels:
|
||
name: pod-delete-sa
|
||
\---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: Role
|
||
metadata:
|
||
name: pod-delete-sa
|
||
namespace: more-apps
|
||
labels:
|
||
name: pod-delete-sa
|
||
rules:
|
||
\- apiGroups: [""]
|
||
resources: ["pods","events"]
|
||
verbs: ["create","list","get","patch","update","delete","deletecollection"]
|
||
\- apiGroups: [""]
|
||
resources: ["pods/exec","pods/log","replicationcontrollers"]
|
||
verbs: ["create","list","get"]
|
||
\- apiGroups: ["batch"]
|
||
resources: ["jobs"]
|
||
verbs: ["create","list","get","delete","deletecollection"]
|
||
\- apiGroups: ["apps"]
|
||
resources: ["deployments","statefulsets","daemonsets","replicasets"]
|
||
verbs: ["list","get"]
|
||
\- apiGroups: ["apps.openshift.io"]
|
||
resources: ["deploymentconfigs"]
|
||
verbs: ["list","get"]
|
||
\- apiGroups: ["argoproj.io"]
|
||
resources: ["rollouts"]
|
||
verbs: ["list","get"]
|
||
\- apiGroups: ["litmuschaos.io"]
|
||
resources: ["chaosengines","chaosexperiments","chaosresults"]
|
||
verbs: ["create","list","get","patch","update"]
|
||
\---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: RoleBinding
|
||
metadata:
|
||
name: pod-delete-sa
|
||
namespace: more-apps
|
||
labels:
|
||
name: pod-delete-sa
|
||
roleRef:
|
||
apiGroup: rbac.authorization.k8s.io
|
||
kind: Role
|
||
name: pod-delete-sa
|
||
subjects:
|
||
\- kind: ServiceAccount
|
||
name: pod-delete-sa
|
||
namespace: more-apps
|
||
```
|
||
|
||
Apply the **rbac.yaml** file:
|
||
|
||
|
||
```
|
||
$ kubectl apply -f rbac.yaml
|
||
serviceaccount/pod-delete-sa created
|
||
role.rbac.authorization.k8s.io/pod-delete-sa created
|
||
rolebinding.rbac.authorization.k8s.io/pod-delete-sa created
|
||
```
|
||
|
||
The next step is to prepare your chaos engine to delete pods. The chaos engine will connect the experiment you need to your application instance by creating a **chaosengine.yaml** file and copying the information below into the .yaml file. This will connect your experiment to your namespace and the service account with the role bindings you created above.
|
||
|
||
This chaos engine file only specifies the pod to delete during chaos testing:
|
||
|
||
|
||
```
|
||
apiVersion: litmuschaos.io/v1alpha1
|
||
kind: ChaosEngine
|
||
metadata:
|
||
name: moreapps-chaos
|
||
namespace: more-apps
|
||
spec:
|
||
appinfo:
|
||
appns: 'more-apps'
|
||
applabel: 'app=ghost'
|
||
appkind: 'deployment'
|
||
# It can be true/false
|
||
annotationCheck: 'true'
|
||
# It can be active/stop
|
||
engineState: 'active'
|
||
#ex. values: ns1:name=percona,ns2:run=more-apps
|
||
auxiliaryAppInfo: ''
|
||
chaosServiceAccount: pod-delete-sa
|
||
# It can be delete/retain
|
||
jobCleanUpPolicy: 'delete'
|
||
experiments:
|
||
- name: pod-delete
|
||
spec:
|
||
components:
|
||
env:
|
||
# set chaos duration (in sec) as desired
|
||
- name: TOTAL_CHAOS_DURATION
|
||
value: '30'
|
||
|
||
# set chaos interval (in sec) as desired
|
||
- name: CHAOS_INTERVAL
|
||
value: '10'
|
||
|
||
# pod failures without '--force' & default terminationGracePeriodSeconds
|
||
- name: FORCE
|
||
value: 'false'
|
||
```
|
||
|
||
Don't apply this file until you install the experiments in the next section.
|
||
|
||
### Add new experiments for causing chaos
|
||
|
||
Now that you have an entirely new environment with deployments, roles, and the chaos engine to test against, you need some experiments to run. Since Litmus has a large community, you can find some great experiments in the [Chaos Hub][9].
|
||
|
||
In this walkthrough, I'll use the generic experiment of [killing a pod][10].
|
||
|
||
Run a kubectl command to install the generic experiments into your cluster. Install this in your `more-apps` namespace; you will see the tests created when you run it:
|
||
|
||
|
||
```
|
||
$ kubectl apply -f <https://hub.litmuschaos.io/api/chaos/1.13.3?file=charts/generic/experiments.yaml> -n more-apps
|
||
chaosexperiment.litmuschaos.io/pod-network-duplication created
|
||
chaosexperiment.litmuschaos.io/node-cpu-hog created
|
||
chaosexperiment.litmuschaos.io/node-drain created
|
||
chaosexperiment.litmuschaos.io/docker-service-kill created
|
||
chaosexperiment.litmuschaos.io/node-taint created
|
||
chaosexperiment.litmuschaos.io/pod-autoscaler created
|
||
chaosexperiment.litmuschaos.io/pod-network-loss created
|
||
chaosexperiment.litmuschaos.io/node-memory-hog created
|
||
chaosexperiment.litmuschaos.io/disk-loss created
|
||
chaosexperiment.litmuschaos.io/pod-io-stress created
|
||
chaosexperiment.litmuschaos.io/pod-network-corruption created
|
||
chaosexperiment.litmuschaos.io/container-kill created
|
||
chaosexperiment.litmuschaos.io/node-restart created
|
||
chaosexperiment.litmuschaos.io/node-io-stress created
|
||
chaosexperiment.litmuschaos.io/disk-fill created
|
||
chaosexperiment.litmuschaos.io/pod-cpu-hog created
|
||
chaosexperiment.litmuschaos.io/pod-network-latency created
|
||
chaosexperiment.litmuschaos.io/kubelet-service-kill created
|
||
chaosexperiment.litmuschaos.io/k8-pod-delete created
|
||
chaosexperiment.litmuschaos.io/pod-delete created
|
||
chaosexperiment.litmuschaos.io/node-poweroff created
|
||
chaosexperiment.litmuschaos.io/k8-service-kill created
|
||
chaosexperiment.litmuschaos.io/pod-memory-hog created
|
||
```
|
||
|
||
Verify the experiments installed correctly:
|
||
|
||
|
||
```
|
||
$ kubectl get chaosexperiments -n more-apps
|
||
NAME AGE
|
||
container-kill 72s
|
||
disk-fill 72s
|
||
disk-loss 72s
|
||
docker-service-kill 72s
|
||
k8-pod-delete 72s
|
||
k8-service-kill 72s
|
||
kubelet-service-kill 72s
|
||
node-cpu-hog 72s
|
||
node-drain 72s
|
||
node-io-stress 72s
|
||
node-memory-hog 72s
|
||
node-poweroff 72s
|
||
node-restart 72s
|
||
node-taint 72s
|
||
pod-autoscaler 72s
|
||
pod-cpu-hog 72s
|
||
pod-delete 72s
|
||
pod-io-stress 72s
|
||
pod-memory-hog 72s
|
||
pod-network-corruption 72s
|
||
pod-network-duplication 72s
|
||
pod-network-latency 72s
|
||
pod-network-loss 72s
|
||
```
|
||
|
||
### Run the experiments
|
||
|
||
Now that everything is installed and configured, use your **chaosengine.yaml** file to run the pod-deletion experiment you defined. Apply your chaos engine file:
|
||
|
||
|
||
```
|
||
$ kubectl apply -f chaosengine.yaml
|
||
chaosengine.litmuschaos.io/more-apps-chaos created
|
||
```
|
||
|
||
Confirm the engine started by getting all the pods in your namespace; you should see `pod-delete` being created:
|
||
|
||
|
||
```
|
||
$ kubectl get pods -n more-apps
|
||
NAME READY STATUS RESTARTS AGE
|
||
ghost-5bdd4cdcc4-blmtl 1/1 Running 0 53m
|
||
ghost-5bdd4cdcc4-z2lnt 1/1 Running 0 53m
|
||
ghost-5bdd4cdcc4-zlcc9 1/1 Running 0 53m
|
||
ghost-5bdd4cdcc4-zrs8f 1/1 Running 0 53m
|
||
moreapps-chaos-runner 1/1 Running 0 17s
|
||
pod-delete-e443qx-lxzfx 0/1 ContainerCreating 0 7s
|
||
```
|
||
|
||
Next, you need to be able to observe your experiments using Litmus. The following command uses the ChaosResult CRD and provides a large amount of output:
|
||
|
||
|
||
```
|
||
$ kubectl describe chaosresult moreapps-chaos-pod-delete -n more-apps
|
||
Name: moreapps-chaos-pod-delete
|
||
Namespace: more-apps
|
||
Labels: app.kubernetes.io/component=experiment-job
|
||
app.kubernetes.io/part-of=litmus
|
||
app.kubernetes.io/version=1.13.3
|
||
chaosUID=a6c9ab7e-ff07-4703-abe4-43e03b77bd72
|
||
controller-uid=601b7330-c6f3-4d9b-90cb-2c761ac0567a
|
||
job-name=pod-delete-e443qx
|
||
name=moreapps-chaos-pod-delete
|
||
Annotations: <none>
|
||
API Version: litmuschaos.io/v1alpha1
|
||
Kind: ChaosResult
|
||
Metadata:
|
||
Creation Timestamp: 2021-05-09T22:06:19Z
|
||
Generation: 2
|
||
Managed Fields:
|
||
API Version: litmuschaos.io/v1alpha1
|
||
Fields Type: FieldsV1
|
||
fieldsV1:
|
||
f:metadata:
|
||
f:labels:
|
||
.:
|
||
f:app.kubernetes.io/component:
|
||
f:app.kubernetes.io/part-of:
|
||
f:app.kubernetes.io/version:
|
||
f:chaosUID:
|
||
f:controller-uid:
|
||
f:job-name:
|
||
f:name:
|
||
f:spec:
|
||
.:
|
||
f:engine:
|
||
f:experiment:
|
||
f:status:
|
||
.:
|
||
f:experimentStatus:
|
||
f:history:
|
||
Manager: experiments
|
||
Operation: Update
|
||
Time: 2021-05-09T22:06:53Z
|
||
Resource Version: 8406
|
||
Self Link: /apis/litmuschaos.io/v1alpha1/namespaces/more-apps/chaosresults/moreapps-chaos-pod-delete
|
||
UID: 08b7e3da-d603-49c7-bac4-3b54eb30aff8
|
||
Spec:
|
||
Engine: moreapps-chaos
|
||
Experiment: pod-delete
|
||
Status:
|
||
Experiment Status:
|
||
Fail Step: N/A
|
||
Phase: Completed
|
||
Probe Success Percentage: 100
|
||
Verdict: Pass
|
||
History:
|
||
Failed Runs: 0
|
||
Passed Runs: 1
|
||
Stopped Runs: 0
|
||
Events:
|
||
Type Reason Age From Message
|
||
---- ------ ---- ---- -------
|
||
Normal Pass 104s pod-delete-e443qx-lxzfx experiment: pod-delete, Result: Pass
|
||
```
|
||
|
||
You can see the pass or fail output from your testing as you run the chaos engine definitions.
|
||
|
||
Congratulations on your first (and hopefully not last) chaos engineering test! Now you have a powerful tool to use and help your environment grow.
|
||
|
||
### Final thoughts
|
||
|
||
You might be thinking, "I can't run this manually every time I want to run chaos. How far can I take this, and how can I set it up for the long term?"
|
||
|
||
Litmus' best part (aside from the Chaos Hub) is its [scheduler][11] function. You can use it to define times and dates, repetitions or sporadic, to run experiments. This is a great tool for detailed admins who have been working with Kubernetes for a while and are ready to create some chaos. I suggest staying up to date on Litmus and how to use this tool for regular chaos engineering. Happy pod hunting!
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
via: https://opensource.com/article/21/6/kubernetes-litmus-chaos
|
||
|
||
作者:[Jessica Cherry][a]
|
||
选题:[lujun9972][b]
|
||
译者:[译者ID](https://github.com/译者ID)
|
||
校对:[校对者ID](https://github.com/校对者ID)
|
||
|
||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||
|
||
[a]: https://opensource.com/users/cherrybomb
|
||
[b]: https://github.com/lujun9972
|
||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/science_experiment_beaker_lab.png?itok=plKWRhlU (Science lab with beakers)
|
||
[2]: https://github.com/litmuschaos/litmus
|
||
[3]: https://opensource.com/article/21/5/11-years-kubernetes-and-chaos
|
||
[4]: https://opensource.com/article/21/5/get-your-steady-state-chaos-grafana-and-prometheus
|
||
[5]: https://minikube.sigs.k8s.io/docs/start/
|
||
[6]: https://litmuschaos.io/
|
||
[7]: https://docs.litmuschaos.io
|
||
[8]: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
|
||
[9]: https://hub.litmuschaos.io/
|
||
[10]: https://docs.litmuschaos.io/docs/pod-delete/
|
||
[11]: https://docs.litmuschaos.io/docs/scheduling/
|