mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
选题[tech]: 20210602 Test Kubernetes cluster failures and experiments in your terminal
sources/tech/20210602 Test Kubernetes cluster failures and experiments in your terminal.md
This commit is contained in:
parent
62ce58c3e2
commit
37282ac330
@ -0,0 +1,486 @@
|
||||
[#]: subject: (Test Kubernetes cluster failures and experiments in your terminal)
|
||||
[#]: via: (https://opensource.com/article/21/6/kubernetes-litmus-chaos)
|
||||
[#]: author: (Jessica Cherry https://opensource.com/users/cherrybomb)
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ( )
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
|
||||
Test Kubernetes cluster failures and experiments in your terminal
|
||||
======
|
||||
Litmus is an effective tool to cause chaos to test how your system will
|
||||
respond to failure.
|
||||
![Science lab with beakers][1]
|
||||
|
||||
Do you know how your system will respond to an arbitrary failure? Will your application fail? Will anything survive after a loss? If you're not sure, it's time to see if your system passes the [Litmus][2] test, a detailed way to cause chaos at random with many experiments.
|
||||
|
||||
In the first article in this series, I explained [what chaos engineering is][3], and in the second article, I demonstrated how to get your [system's steady state][4] so that you can compare it against a chaos state. This third article will show you how to install and use Litmus to test arbitrary failures and experiments in your Kubernetes cluster. In this walkthrough, I'll use Pop!_OS 20.04, Helm 3, Minikube 1.14.2, and Kubernetes 1.19.
|
||||
|
||||
### Configure Minikube
|
||||
|
||||
If you haven't already, [install Minikube][5] in whatever way makes sense for your environment. If you have enough resources, I recommend giving your virtual machine a bit more than the default memory and CPU power:
|
||||
|
||||
|
||||
```
|
||||
$ minikube config set memory 8192
|
||||
❗ These changes will take effect upon a minikube delete and then a minikube start
|
||||
$ minikube config set cpus 6
|
||||
❗ These changes will take effect upon a minikube delete and then a minikube start
|
||||
```
|
||||
|
||||
Then start and check your system's status:
|
||||
|
||||
|
||||
```
|
||||
$ minikube start
|
||||
😄 minikube v1.14.2 on Debian bullseye/sid
|
||||
🎉 minikube 1.19.0 is available! Download it: <https://github.com/kubernetes/minikube/releases/tag/v1.19.0>
|
||||
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
|
||||
|
||||
✨ Using the docker driver based on user configuration
|
||||
👍 Starting control plane node minikube in cluster minikube
|
||||
🔥 Creating docker container (CPUs=6, Memory=8192MB) ...
|
||||
🐳 Preparing Kubernetes v1.19.0 on Docker 19.03.8 ...
|
||||
🔎 Verifying Kubernetes components...
|
||||
🌟 Enabled addons: storage-provisioner, default-storageclass
|
||||
🏄 Done! kubectl is now configured to use "minikube" by default
|
||||
jess@Athena:~$ minikube status
|
||||
minikube
|
||||
type: Control Plane
|
||||
host: Running
|
||||
kubelet: Running
|
||||
apiserver: Running
|
||||
kubeconfig: Configured
|
||||
```
|
||||
|
||||
### Install Litmus
|
||||
|
||||
As outlined on [Litmus' homepage][6], the steps to install Litmus are: add your repo to Helm, create your Litmus namespace, then install your chart:
|
||||
|
||||
|
||||
```
|
||||
$ helm repo add litmuschaos <https://litmuschaos.github.io/litmus-helm/>
|
||||
"litmuschaos" has been added to your repositories
|
||||
|
||||
$ kubectl create ns litmus
|
||||
namespace/litmus created
|
||||
|
||||
$ helm install chaos litmuschaos/litmus --namespace=litmus
|
||||
NAME: chaos
|
||||
LAST DEPLOYED: Sun May 9 17:05:36 2021
|
||||
NAMESPACE: litmus
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
```
|
||||
|
||||
### Verify the installation
|
||||
|
||||
You can run the following commands if you want to verify all the desired components are installed correctly.
|
||||
|
||||
Check if **api-resources** for chaos are available:
|
||||
|
||||
|
||||
```
|
||||
root@demo:~# kubectl api-resources | grep litmus
|
||||
chaosengines litmuschaos.io true ChaosEngine
|
||||
chaosexperiments litmuschaos.io true ChaosExperiment
|
||||
chaosresults litmuschaos.io true ChaosResult
|
||||
```
|
||||
|
||||
Check if the Litmus chaos operator deployment is running successfully:
|
||||
|
||||
|
||||
```
|
||||
root@demo:~# kubectl get pods -n litmus
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
litmus-7d998b6568-nnlcd 1/1 Running 0 106s
|
||||
```
|
||||
|
||||
### Start running chaos experiments
|
||||
|
||||
With this out of the way, you are good to go! Refer to Litmus' [chaos experiment documentation][7] to start executing your first experiment.
|
||||
|
||||
To confirm your installation is working, check that the pod is up and running correctly:
|
||||
|
||||
|
||||
```
|
||||
jess@Athena:~$ kubectl get pods -n litmus
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
litmus-7d6f994d88-2g7wn 1/1 Running 0 115s
|
||||
```
|
||||
|
||||
Confirm the Custom Resource Definitions (CRDs) are also installed correctly:
|
||||
|
||||
|
||||
```
|
||||
jess@Athena:~$ kubectl get crds | grep chaos
|
||||
chaosengines.litmuschaos.io 2021-05-09T21:05:33Z
|
||||
chaosexperiments.litmuschaos.io 2021-05-09T21:05:33Z
|
||||
chaosresults.litmuschaos.io 2021-05-09T21:05:33Z
|
||||
```
|
||||
|
||||
Finally, confirm your API resources are also installed:
|
||||
|
||||
|
||||
```
|
||||
jess@Athena:~$ kubectl api-resources | grep chaos
|
||||
chaosengines litmuschaos.io true ChaosEngine
|
||||
chaosexperiments litmuschaos.io true ChaosExperiment
|
||||
chaosresults litmuschaos.io true ChaosResult
|
||||
```
|
||||
|
||||
That's what I call easy installation and confirmation. The next step is setting up deployments for chaos.
|
||||
|
||||
### Prep for destruction
|
||||
|
||||
To test for chaos, you need something to test against. Add a new namespace:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl create namespace more-apps
|
||||
namespace/more-apps created
|
||||
```
|
||||
|
||||
Then add a deployment to the new namespace:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl create deployment ghost --namespace more-apps --image=ghost:3.11.0-alpine
|
||||
deployment.apps/ghost created
|
||||
```
|
||||
|
||||
Finally, scale your deployment up so that you have more than one pod in your deployment to test against:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl scale deployment/ghost --namespace more-apps --replicas=4
|
||||
deployment.apps/ghost scaled
|
||||
```
|
||||
|
||||
For Litmus to cause chaos, you need to add an [annotation][8] to your deployment to mark it ready for chaos. Currently, annotations are available for deployments, StatefulSets, and DaemonSets. Add the annotation `chaos=true` to your deployment:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl annotate deploy/ghost litmuschaos.io/chaos="true" -n more-apps
|
||||
deployment.apps/ghost annotated
|
||||
```
|
||||
|
||||
Make sure the experiments you will install have the correct permissions to work in the "more-apps" namespace.
|
||||
|
||||
Make a new **rbac.yaml** file for the prepper bindings and permissions:
|
||||
|
||||
|
||||
```
|
||||
`$ touch rbac.yaml`
|
||||
```
|
||||
|
||||
Then add permissions for the generic testing by copying and pasting the code below into your **rbac.yaml** file. These are just basic, minimal permissions to kill pods in your namespace and give Litmus permissions to delete a pod for a namespace you provide:
|
||||
|
||||
|
||||
```
|
||||
\---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: pod-delete-sa
|
||||
namespace: more-apps
|
||||
labels:
|
||||
name: pod-delete-sa
|
||||
\---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: pod-delete-sa
|
||||
namespace: more-apps
|
||||
labels:
|
||||
name: pod-delete-sa
|
||||
rules:
|
||||
\- apiGroups: [""]
|
||||
resources: ["pods","events"]
|
||||
verbs: ["create","list","get","patch","update","delete","deletecollection"]
|
||||
\- apiGroups: [""]
|
||||
resources: ["pods/exec","pods/log","replicationcontrollers"]
|
||||
verbs: ["create","list","get"]
|
||||
\- apiGroups: ["batch"]
|
||||
resources: ["jobs"]
|
||||
verbs: ["create","list","get","delete","deletecollection"]
|
||||
\- apiGroups: ["apps"]
|
||||
resources: ["deployments","statefulsets","daemonsets","replicasets"]
|
||||
verbs: ["list","get"]
|
||||
\- apiGroups: ["apps.openshift.io"]
|
||||
resources: ["deploymentconfigs"]
|
||||
verbs: ["list","get"]
|
||||
\- apiGroups: ["argoproj.io"]
|
||||
resources: ["rollouts"]
|
||||
verbs: ["list","get"]
|
||||
\- apiGroups: ["litmuschaos.io"]
|
||||
resources: ["chaosengines","chaosexperiments","chaosresults"]
|
||||
verbs: ["create","list","get","patch","update"]
|
||||
\---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: pod-delete-sa
|
||||
namespace: more-apps
|
||||
labels:
|
||||
name: pod-delete-sa
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: pod-delete-sa
|
||||
subjects:
|
||||
\- kind: ServiceAccount
|
||||
name: pod-delete-sa
|
||||
namespace: more-apps
|
||||
```
|
||||
|
||||
Apply the **rbac.yaml** file:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl apply -f rbac.yaml
|
||||
serviceaccount/pod-delete-sa created
|
||||
role.rbac.authorization.k8s.io/pod-delete-sa created
|
||||
rolebinding.rbac.authorization.k8s.io/pod-delete-sa created
|
||||
```
|
||||
|
||||
The next step is to prepare your chaos engine to delete pods. The chaos engine will connect the experiment you need to your application instance by creating a **chaosengine.yaml** file and copying the information below into the .yaml file. This will connect your experiment to your namespace and the service account with the role bindings you created above.
|
||||
|
||||
This chaos engine file only specifies the pod to delete during chaos testing:
|
||||
|
||||
|
||||
```
|
||||
apiVersion: litmuschaos.io/v1alpha1
|
||||
kind: ChaosEngine
|
||||
metadata:
|
||||
name: moreapps-chaos
|
||||
namespace: more-apps
|
||||
spec:
|
||||
appinfo:
|
||||
appns: 'more-apps'
|
||||
applabel: 'app=ghost'
|
||||
appkind: 'deployment'
|
||||
# It can be true/false
|
||||
annotationCheck: 'true'
|
||||
# It can be active/stop
|
||||
engineState: 'active'
|
||||
#ex. values: ns1:name=percona,ns2:run=more-apps
|
||||
auxiliaryAppInfo: ''
|
||||
chaosServiceAccount: pod-delete-sa
|
||||
# It can be delete/retain
|
||||
jobCleanUpPolicy: 'delete'
|
||||
experiments:
|
||||
- name: pod-delete
|
||||
spec:
|
||||
components:
|
||||
env:
|
||||
# set chaos duration (in sec) as desired
|
||||
- name: TOTAL_CHAOS_DURATION
|
||||
value: '30'
|
||||
|
||||
# set chaos interval (in sec) as desired
|
||||
- name: CHAOS_INTERVAL
|
||||
value: '10'
|
||||
|
||||
# pod failures without '--force' & default terminationGracePeriodSeconds
|
||||
- name: FORCE
|
||||
value: 'false'
|
||||
```
|
||||
|
||||
Don't apply this file until you install the experiments in the next section.
|
||||
|
||||
### Add new experiments for causing chaos
|
||||
|
||||
Now that you have an entirely new environment with deployments, roles, and the chaos engine to test against, you need some experiments to run. Since Litmus has a large community, you can find some great experiments in the [Chaos Hub][9].
|
||||
|
||||
In this walkthrough, I'll use the generic experiment of [killing a pod][10].
|
||||
|
||||
Run a kubectl command to install the generic experiments into your cluster. Install this in your `more-apps` namespace; you will see the tests created when you run it:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl apply -f <https://hub.litmuschaos.io/api/chaos/1.13.3?file=charts/generic/experiments.yaml> -n more-apps
|
||||
chaosexperiment.litmuschaos.io/pod-network-duplication created
|
||||
chaosexperiment.litmuschaos.io/node-cpu-hog created
|
||||
chaosexperiment.litmuschaos.io/node-drain created
|
||||
chaosexperiment.litmuschaos.io/docker-service-kill created
|
||||
chaosexperiment.litmuschaos.io/node-taint created
|
||||
chaosexperiment.litmuschaos.io/pod-autoscaler created
|
||||
chaosexperiment.litmuschaos.io/pod-network-loss created
|
||||
chaosexperiment.litmuschaos.io/node-memory-hog created
|
||||
chaosexperiment.litmuschaos.io/disk-loss created
|
||||
chaosexperiment.litmuschaos.io/pod-io-stress created
|
||||
chaosexperiment.litmuschaos.io/pod-network-corruption created
|
||||
chaosexperiment.litmuschaos.io/container-kill created
|
||||
chaosexperiment.litmuschaos.io/node-restart created
|
||||
chaosexperiment.litmuschaos.io/node-io-stress created
|
||||
chaosexperiment.litmuschaos.io/disk-fill created
|
||||
chaosexperiment.litmuschaos.io/pod-cpu-hog created
|
||||
chaosexperiment.litmuschaos.io/pod-network-latency created
|
||||
chaosexperiment.litmuschaos.io/kubelet-service-kill created
|
||||
chaosexperiment.litmuschaos.io/k8-pod-delete created
|
||||
chaosexperiment.litmuschaos.io/pod-delete created
|
||||
chaosexperiment.litmuschaos.io/node-poweroff created
|
||||
chaosexperiment.litmuschaos.io/k8-service-kill created
|
||||
chaosexperiment.litmuschaos.io/pod-memory-hog created
|
||||
```
|
||||
|
||||
Verify the experiments installed correctly:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl get chaosexperiments -n more-apps
|
||||
NAME AGE
|
||||
container-kill 72s
|
||||
disk-fill 72s
|
||||
disk-loss 72s
|
||||
docker-service-kill 72s
|
||||
k8-pod-delete 72s
|
||||
k8-service-kill 72s
|
||||
kubelet-service-kill 72s
|
||||
node-cpu-hog 72s
|
||||
node-drain 72s
|
||||
node-io-stress 72s
|
||||
node-memory-hog 72s
|
||||
node-poweroff 72s
|
||||
node-restart 72s
|
||||
node-taint 72s
|
||||
pod-autoscaler 72s
|
||||
pod-cpu-hog 72s
|
||||
pod-delete 72s
|
||||
pod-io-stress 72s
|
||||
pod-memory-hog 72s
|
||||
pod-network-corruption 72s
|
||||
pod-network-duplication 72s
|
||||
pod-network-latency 72s
|
||||
pod-network-loss 72s
|
||||
```
|
||||
|
||||
### Run the experiments
|
||||
|
||||
Now that everything is installed and configured, use your **chaosengine.yaml** file to run the pod-deletion experiment you defined. Apply your chaos engine file:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl apply -f chaosengine.yaml
|
||||
chaosengine.litmuschaos.io/more-apps-chaos created
|
||||
```
|
||||
|
||||
Confirm the engine started by getting all the pods in your namespace; you should see `pod-delete` being created:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl get pods -n more-apps
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
ghost-5bdd4cdcc4-blmtl 1/1 Running 0 53m
|
||||
ghost-5bdd4cdcc4-z2lnt 1/1 Running 0 53m
|
||||
ghost-5bdd4cdcc4-zlcc9 1/1 Running 0 53m
|
||||
ghost-5bdd4cdcc4-zrs8f 1/1 Running 0 53m
|
||||
moreapps-chaos-runner 1/1 Running 0 17s
|
||||
pod-delete-e443qx-lxzfx 0/1 ContainerCreating 0 7s
|
||||
```
|
||||
|
||||
Next, you need to be able to observe your experiments using Litmus. The following command uses the ChaosResult CRD and provides a large amount of output:
|
||||
|
||||
|
||||
```
|
||||
$ kubectl describe chaosresult moreapps-chaos-pod-delete -n more-apps
|
||||
Name: moreapps-chaos-pod-delete
|
||||
Namespace: more-apps
|
||||
Labels: app.kubernetes.io/component=experiment-job
|
||||
app.kubernetes.io/part-of=litmus
|
||||
app.kubernetes.io/version=1.13.3
|
||||
chaosUID=a6c9ab7e-ff07-4703-abe4-43e03b77bd72
|
||||
controller-uid=601b7330-c6f3-4d9b-90cb-2c761ac0567a
|
||||
job-name=pod-delete-e443qx
|
||||
name=moreapps-chaos-pod-delete
|
||||
Annotations: <none>
|
||||
API Version: litmuschaos.io/v1alpha1
|
||||
Kind: ChaosResult
|
||||
Metadata:
|
||||
Creation Timestamp: 2021-05-09T22:06:19Z
|
||||
Generation: 2
|
||||
Managed Fields:
|
||||
API Version: litmuschaos.io/v1alpha1
|
||||
Fields Type: FieldsV1
|
||||
fieldsV1:
|
||||
f:metadata:
|
||||
f:labels:
|
||||
.:
|
||||
f:app.kubernetes.io/component:
|
||||
f:app.kubernetes.io/part-of:
|
||||
f:app.kubernetes.io/version:
|
||||
f:chaosUID:
|
||||
f:controller-uid:
|
||||
f:job-name:
|
||||
f:name:
|
||||
f:spec:
|
||||
.:
|
||||
f:engine:
|
||||
f:experiment:
|
||||
f:status:
|
||||
.:
|
||||
f:experimentStatus:
|
||||
f:history:
|
||||
Manager: experiments
|
||||
Operation: Update
|
||||
Time: 2021-05-09T22:06:53Z
|
||||
Resource Version: 8406
|
||||
Self Link: /apis/litmuschaos.io/v1alpha1/namespaces/more-apps/chaosresults/moreapps-chaos-pod-delete
|
||||
UID: 08b7e3da-d603-49c7-bac4-3b54eb30aff8
|
||||
Spec:
|
||||
Engine: moreapps-chaos
|
||||
Experiment: pod-delete
|
||||
Status:
|
||||
Experiment Status:
|
||||
Fail Step: N/A
|
||||
Phase: Completed
|
||||
Probe Success Percentage: 100
|
||||
Verdict: Pass
|
||||
History:
|
||||
Failed Runs: 0
|
||||
Passed Runs: 1
|
||||
Stopped Runs: 0
|
||||
Events:
|
||||
Type Reason Age From Message
|
||||
---- ------ ---- ---- -------
|
||||
Normal Pass 104s pod-delete-e443qx-lxzfx experiment: pod-delete, Result: Pass
|
||||
```
|
||||
|
||||
You can see the pass or fail output from your testing as you run the chaos engine definitions.
|
||||
|
||||
Congratulations on your first (and hopefully not last) chaos engineering test! Now you have a powerful tool to use and help your environment grow.
|
||||
|
||||
### Final thoughts
|
||||
|
||||
You might be thinking, "I can't run this manually every time I want to run chaos. How far can I take this, and how can I set it up for the long term?"
|
||||
|
||||
Litmus' best part (aside from the Chaos Hub) is its [scheduler][11] function. You can use it to define times and dates, repetitions or sporadic, to run experiments. This is a great tool for detailed admins who have been working with Kubernetes for a while and are ready to create some chaos. I suggest staying up to date on Litmus and how to use this tool for regular chaos engineering. Happy pod hunting!
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/21/6/kubernetes-litmus-chaos
|
||||
|
||||
作者:[Jessica Cherry][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/cherrybomb
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/science_experiment_beaker_lab.png?itok=plKWRhlU (Science lab with beakers)
|
||||
[2]: https://github.com/litmuschaos/litmus
|
||||
[3]: https://opensource.com/article/21/5/11-years-kubernetes-and-chaos
|
||||
[4]: https://opensource.com/article/21/5/get-your-steady-state-chaos-grafana-and-prometheus
|
||||
[5]: https://minikube.sigs.k8s.io/docs/start/
|
||||
[6]: https://litmuschaos.io/
|
||||
[7]: https://docs.litmuschaos.io
|
||||
[8]: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
|
||||
[9]: https://hub.litmuschaos.io/
|
||||
[10]: https://docs.litmuschaos.io/docs/pod-delete/
|
||||
[11]: https://docs.litmuschaos.io/docs/scheduling/
|
Loading…
Reference in New Issue
Block a user