31
Create Chaos Experiments Using the LitmusChaos Python SDK
Let’s see how to create chaos using a python SDK without losing time, step by step.
Before we get started big announcement is LitmusChaos 2.0 is out and litmus-python is also a part if it.
I'm focusing on creating chaos, so if you want to know more about Chaos Engineering then please follow LitmusChaos and details regarding Python SDK litmus-python
cd contribute/developer-guide
Now update your
attributes.yaml
manifest. Like the name, category, etc. but follow:Use _ in
names eg:sample_category
-
Now run:
python3 generate_experiment.py -f=attributes.yaml -g="generate-type" -t="type"
- You may run both commands
- For Experiment: python3 generate_experiment.py -f=attributes.yaml -g=experiment
- For Charts : python3 generate_experiment.py -f=attributes.yaml -g=chart
Note: Replace the -g=<generate-type> placeholder with the appropriate value based on the usecase:
- experiment: Chaos experiment artifacts belonging to an existing OR new experiment.
- chart: Just the chaos-chart metadata, i.e., chartserviceversion.yaml
Provide the type of chart in the `-t=<type>` flag. It supports the following values:
- category: It creates the chart metadata for the category i.e chartserviceversion, package manifests
- experiment: It creates the chart for the experiment i.e chartserviceversion, engine, rbac, experiment manifests
- all: it creates both category and experiment charts (default type)
Provide the path of the attribute.yaml manifest in the -f flag.
Check:
chaosLib/litmus/
,experiments/
andpkg/
directories. Sample chaos has been generated.Open
bin/experiment/experiment.py
and
import experiments.sample_category.sample_exec_chaos.experiment.sample_exec_chaos as experiment
(sample_exec_experimet and sample_exec_chaos will be your provided names in attribute.yaml manifest, Default one mentioned everywhere) and add one more elif
condition
elif args.name == "chaos":
experiment.Experiment(clients)
- Add directories in setup.py
'chaosLib/litmus/sample_exec_chaos',
'chaosLib/litmus/sample_exec_chaos/lib',
'pkg/sample_category',
'pkg/sample_category/environment',
'pkg/sample_category/types',
'experiments/sample_category',
'experiments/sample_category/sample_exec_chaos',
'experiments/sample_category/sample_exec_chaos/experiment',
I've updated default names.
Let’s come bank to root directory
litmuschaos/litmus-python
to setup environment.python3 -m virtualenv chaos
source chaos/bin/activate
python3 setup.py install (You need to run this every time before running
python3 experiment.py -name chaos
, Installing all required
prerequisites and setting up directory structure)Now ready to code, Just open
chaosLib/litmus/sample_exec_chaos/lib/
sample_exec_chaos.py
andexperiments/sample_category/sample_exec_chaos/experiment/sample_exec_chaos.py
files and start writing chaos…Create a sample Nginx deployment that can be used as the application under test (AUT).
kubectl create deployment nginx --image=nginx
- Go to pkg/sample_category and open the environment.py or types.py and add/delete/update the required env.
Note:
Add &
operator at the end of chaos commands CHAOS_INJECT_COMMAND
example: md5sum /dev/zero &
. As we are running chaos commands as a background process in a separate thread.
- Go to bin/experiment and run:
python3 experiment.py -name chaos
. Before this command always runpython3 setup.py install
in the root directory. Example chaos logs:
time=2021-08-13 11:43:12,392 level=INFO msg=Experiment Name: chaos
time=2021-08-13 11:43:12,392 level=INFO msg=[PreReq]: Initialise Chaos Variables for the sample-chaos experiment
time=2021-08-13 11:43:12,393 level=INFO msg=[PreReq]: Updating the chaos result of sample-chaos experiment (SOT)
time=2021-08-13 11:43:12,867 level=INFO msg=[Info]: The application information is as follows Namespace=litmus, Label=app=nginx, Ramp Time=0
time=2021-08-13 11:43:12,867 level=INFO msg=[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)
time=2021-08-13 11:43:12,867 level=INFO msg=[status]: Checking whether application containers are in ready state
time=2021-08-13 11:43:12,887 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-c65kl, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-p87pc, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-s5hjs, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO msg=[status]: Checking whether application pods are in running state
time=2021-08-13 11:43:12,908 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-c65kl status : Running
time=2021-08-13 11:43:12,908 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-p87pc status : Running
time=2021-08-13 11:43:12,908 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-s5hjs status : Running
time=2021-08-13 11:43:12,938 level=INFO msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,943 level=INFO msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,947 level=INFO msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,947 level=INFO msg=[Chaos]:Number of pods targeted: 3
time=2021-08-13 11:43:12,947 level=INFO msg=[Info]: Target pods list, ['nginx-66b6c48dd5-c65kl', 'nginx-66b6c48dd5-p87pc', 'nginx-66b6c48dd5-s5hjs']
time=2021-08-13 11:43:12,955 level=INFO msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-c65kl
time=2021-08-13 11:43:12,956 level=INFO msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:22,955 level=INFO msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:23,155 level=INFO msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-p87pc
time=2021-08-13 11:43:23,164 level=INFO msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:33,155 level=INFO msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:33,289 level=INFO msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-s5hjs
time=2021-08-13 11:43:33,289 level=INFO msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:43,289 level=INFO msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:43,405 level=INFO msg=[Confirmation]: sample-chaos chaos has been injected successfully
time=2021-08-13 11:43:43,405 level=INFO msg=[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)
time=2021-08-13 11:43:43,405 level=INFO msg=[status]: Checking whether application containers are in ready state
time=2021-08-13 11:43:43,416 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-c65kl, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-p87pc, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-s5hjs, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO msg=[status]: Checking whether application pods are in running state
time=2021-08-13 11:43:43,429 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-c65kl status : Running
time=2021-08-13 11:43:43,429 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-p87pc status : Running
time=2021-08-13 11:43:43,429 level=INFO msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-s5hjs status : Running
time=2021-08-13 11:43:43,429 level=INFO msg=[The End]: Updating the chaos result of sample-chaos experiment (EOT)
- Now make sure that you have created all the required charts. In directory
experiments/sample_category/sample_exec_chaos/charts
- Go to the root directory
litmuschaos/litmus-python
. Build a docker image: docker build -t your-user-name/py-runner:ci . and push itdocker push your-user-name/py-runner:ci
Refer docs for more details python-sdk
Run the experiment.yml with the desired values in the ENV and appropriate chaosServiceAccount using a custom dev image instead of litmuschaos/litmus-python
(say, oumkale/litmus-python:ci
) that packages the business logic.
Create a custom image built with the code validated by the previous steps.
Launch the Chaos-Operator:
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml
- Setup the RBAC necessary for the execution of this experiment by applying the generated
rbac.yaml
.
kubectl apply -f rbac.yaml
Modify the ChaosExperiment manifest (experiment.yaml) with right defaults (env & other attributes, as applicable) & create this CR on the cluster (pointing the .spec.definition.image to the custom one just built).
Modify the ChaosEngine manifest (engine.yaml) with the right app details, run properties & creating this CR to launch the chaos pods.
Verify the experiment status via ChaosResult.
Refer litmus docs for more details on performing each step in this procedure.
Run all the CRs and operator in the single namespace
N/b: Also use &
operator at last if any chaos command is required as an ENV.
Now list the pods in the namespace
If engine pod is running with runner
nginx-chaos-runner 1/1 Running 0 35s
pod-cpu-hog-exec-2szc9z-rjjkr 1/1 Running 0 33s
See the logs of the engine pod.
Follow this Link for portal setup details.
You may select 2nd one experiment from my hub as well.
Use upload YAML. (For workflow example: Link). n/b: Here need to upload argo workflow.
-
On selecting next-next you will land on this page where you can edit YAML or you may update using UI only on clicking
Name
, which will land on the next screen.- Update engine
spec.chaosServiceAccount: litmus-admin
- Update engine
After scheduling workflow you will land on the workflow dashboard.
Here Logs, Chaos Results, and details about the experiment have been proved, Hence you have completed creating Chaos and testing on the application.
Note: You can see details in UI, but another way is to describe the following CRs.
kubectl get chaosresult -n litmus
kubectl get chaosengine -n litmus
kubectl get chaosexperiment -n litmus
kubectl get workflow -n litmus
Steps to Include the Chaos Charts/Experiments into the ChartHub:
- Send a PR to the litmus-python repo with the modified experiment files, rbac, test deployment & README.
- Send a PR to the chaos-charts repo with the modified experiment CR, experiment chartserviceversion, rbac, (category-level) chaos chart chartserviceversion & package.yaml (if applicable).
- Contact us on slack for any queries or doubts.
Join the LitmusChaos Community Slack channel by joining the #litmus
channel on the Kubernetes (https://slack.k8s.io/) Slack!
31