Create Chaos Experiments Using the LitmusChaos Python SDK

Hello, now LitmusChaos is supporting the python experiments…

Let’s see how to create chaos using a python SDK without losing time, step by step.

Before we get started big announcement is LitmusChaos 2.0 is out and litmus-python is also a part if it.

I'm focusing on creating chaos, so if you want to know more about Chaos Engineering then please follow LitmusChaos and details regarding Python SDK litmus-python

Steps to create Chaos:

  • git clone https://github.com/litmuschaos/litmus-python.git

  • cd contribute/developer-guide

  • Now update your attributes.yaml manifest. Like the name, category, etc. but follow: Use _ in names eg: sample_category

  • Now run: python3 generate_experiment.py -f=attributes.yaml -g="generate-type" -t="type"

    • You may run both commands
    • For Experiment: python3 generate_experiment.py -f=attributes.yaml -g=experiment
    • For Charts : python3 generate_experiment.py -f=attributes.yaml -g=chart
Note: Replace the -g=<generate-type> placeholder with the appropriate value based on the usecase:

 - experiment: Chaos experiment artifacts belonging to an existing OR new experiment.
 - chart: Just the chaos-chart metadata, i.e., chartserviceversion.yaml

Provide the type of chart in the `-t=<type>` flag. It supports the following values:
 - category: It creates the chart metadata for the category i.e chartserviceversion, package manifests
 - experiment: It creates the chart for the experiment i.e chartserviceversion, engine, rbac, experiment manifests
 - all: it creates both category and experiment charts (default type)
Provide the path of the attribute.yaml manifest in the -f flag.
  • Check: chaosLib/litmus/, experiments/ and pkg/ directories. Sample chaos has been generated.

  • Open bin/experiment/experiment.py and

import experiments.sample_category.sample_exec_chaos.experiment.sample_exec_chaos as experiment

(sample_exec_experimet and sample_exec_chaos will be your provided names in attribute.yaml manifest, Default one mentioned everywhere) and add one more elif condition

elif args.name == "chaos":
    experiment.Experiment(clients)
  • Add directories in setup.py
'chaosLib/litmus/sample_exec_chaos',
'chaosLib/litmus/sample_exec_chaos/lib',
'pkg/sample_category',
'pkg/sample_category/environment',
'pkg/sample_category/types',
'experiments/sample_category',
'experiments/sample_category/sample_exec_chaos',
'experiments/sample_category/sample_exec_chaos/experiment',

I've updated default names.

  • Let’s come bank to root directory litmuschaos/litmus-python to setup environment.

  • python3 -m virtualenv chaos

  • source chaos/bin/activate

  • python3 setup.py install (You need to run this every time before running python3 experiment.py -name chaos, Installing all required
    prerequisites and setting up directory structure)

  • Now ready to code, Just open chaosLib/litmus/sample_exec_chaos/lib/ sample_exec_chaos.py and experiments/sample_category/sample_exec_chaos/experiment/sample_exec_chaos.py files and start writing chaos…

  • Create a sample Nginx deployment that can be used as the application under test (AUT).

kubectl create deployment nginx --image=nginx
  • Go to pkg/sample_category and open the environment.py or types.py and add/delete/update the required env.

Note:
Add & operator at the end of chaos commands CHAOS_INJECT_COMMAND
example: md5sum /dev/zero &. As we are running chaos commands as a background process in a separate thread.

  • Go to bin/experiment and run: python3 experiment.py -name chaos. Before this command always run python3 setup.py install in the root directory. Example chaos logs:
time=2021-08-13 11:43:12,392 level=INFO  msg=Experiment Name: chaos
time=2021-08-13 11:43:12,392 level=INFO  msg=[PreReq]: Initialise Chaos Variables for the sample-chaos experiment
time=2021-08-13 11:43:12,393 level=INFO  msg=[PreReq]: Updating the chaos result of sample-chaos experiment (SOT)
time=2021-08-13 11:43:12,867 level=INFO  msg=[Info]: The application information is as follows Namespace=litmus, Label=app=nginx, Ramp Time=0
time=2021-08-13 11:43:12,867 level=INFO  msg=[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)
time=2021-08-13 11:43:12,867 level=INFO  msg=[status]: Checking whether application containers are in ready state
time=2021-08-13 11:43:12,887 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-c65kl, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-p87pc, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-s5hjs, Readiness : True
time=2021-08-13 11:43:12,887 level=INFO  msg=[status]: Checking whether application pods are in running state
time=2021-08-13 11:43:12,908 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-c65kl status : Running
time=2021-08-13 11:43:12,908 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-p87pc status : Running
time=2021-08-13 11:43:12,908 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-s5hjs status : Running
time=2021-08-13 11:43:12,938 level=INFO  msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,943 level=INFO  msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,947 level=INFO  msg=[Info]: chaos candidate of kind: deployment, name: nginx, namespace: litmus
time=2021-08-13 11:43:12,947 level=INFO  msg=[Chaos]:Number of pods targeted: 3
time=2021-08-13 11:43:12,947 level=INFO  msg=[Info]: Target pods list, ['nginx-66b6c48dd5-c65kl', 'nginx-66b6c48dd5-p87pc', 'nginx-66b6c48dd5-s5hjs']
time=2021-08-13 11:43:12,955 level=INFO  msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-c65kl
time=2021-08-13 11:43:12,956 level=INFO  msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:22,955 level=INFO  msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:23,155 level=INFO  msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-p87pc
time=2021-08-13 11:43:23,164 level=INFO  msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:33,155 level=INFO  msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:33,289 level=INFO  msg=[Chaos]: The Target application details container : nginx, Pod : nginx-66b6c48dd5-s5hjs
time=2021-08-13 11:43:33,289 level=INFO  msg=[Chaos]: Waiting for: 10
time=2021-08-13 11:43:43,289 level=INFO  msg=[Chaos]: Time is up for experiment: sample-chaos
time=2021-08-13 11:43:43,405 level=INFO  msg=[Confirmation]: sample-chaos chaos has been injected successfully
time=2021-08-13 11:43:43,405 level=INFO  msg=[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)
time=2021-08-13 11:43:43,405 level=INFO  msg=[status]: Checking whether application containers are in ready state
time=2021-08-13 11:43:43,416 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-c65kl, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-p87pc, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO  msg=[status]: The Container status are as follows Container : nginx, Pod : nginx-66b6c48dd5-s5hjs, Readiness : True
time=2021-08-13 11:43:43,416 level=INFO  msg=[status]: Checking whether application pods are in running state
time=2021-08-13 11:43:43,429 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-c65kl status : Running
time=2021-08-13 11:43:43,429 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-p87pc status : Running
time=2021-08-13 11:43:43,429 level=INFO  msg=[status]: The status of Pods are as follows Pod : nginx-66b6c48dd5-s5hjs status : Running
time=2021-08-13 11:43:43,429 level=INFO  msg=[The End]: Updating the chaos result of sample-chaos experiment (EOT)
  • Now make sure that you have created all the required charts. In directory experiments/sample_category/sample_exec_chaos/charts

After testing locally now let’s go into production.

  • Go to the root directory litmuschaos/litmus-python. Build a docker image: docker build -t your-user-name/py-runner:ci . and push it docker push your-user-name/py-runner:ci

Two ways to test it:

1. Use custom way

Refer docs for more details python-sdk

Run the experiment.yml with the desired values in the ENV and appropriate chaosServiceAccount using a custom dev image instead of litmuschaos/litmus-python (say, oumkale/litmus-python:ci) that packages the business logic.

  • Create a custom image built with the code validated by the previous steps.

  • Launch the Chaos-Operator:

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml
  • Setup the RBAC necessary for the execution of this experiment by applying the generated rbac.yaml.
kubectl apply -f rbac.yaml
  • Modify the ChaosExperiment manifest (experiment.yaml) with right defaults (env & other attributes, as applicable) & create this CR on the cluster (pointing the .spec.definition.image to the custom one just built).

  • Modify the ChaosEngine manifest (engine.yaml) with the right app details, run properties & creating this CR to launch the chaos pods.

  • Verify the experiment status via ChaosResult.

Refer litmus docs for more details on performing each step in this procedure.

Run all the CRs and operator in the single namespace

Example experiment.yaml: Link
Example engine.yaml: Link

N/b: Also use & operator at last if any chaos command is required as an ENV.

Now list the pods in the namespace
If engine pod is running with runner

nginx-chaos-runner                      1/1     Running   0          35s
pod-cpu-hog-exec-2szc9z-rjjkr           1/1     Running   0          33s

See the logs of the engine pod.

2. Using Litmus Portal

Follow this Link for portal setup details.

  • Open portal and go to Workflows -> Schedule a workflow
    Alt Text

  • Select your agent.
    Alt Text

  • You may select 2nd one experiment from my hub as well.

  • Use upload YAML. (For workflow example: Link). n/b: Here need to upload argo workflow.
    Alt Text

  • On selecting next-next you will land on this page where you can edit YAML or you may update using UI only on clicking Name, which will land on the next screen.

    • Update engine spec.chaosServiceAccount: litmus-admin Alt Text
  • Now tune all the ENV and finish Alt Text

  • After scheduling workflow you will land on the workflow dashboard.Alt Text

  • Open Workflow to see details.Alt Text

Here Logs, Chaos Results, and details about the experiment have been proved, Hence you have completed creating Chaos and testing on the application.

Note: You can see details in UI, but another way is to describe the following CRs.

kubectl get chaosresult -n litmus
kubectl get chaosengine -n litmus
kubectl get chaosexperiment -n litmus
kubectl get workflow -n litmus

Congratulation You have been successfully Created and Injected Chaos!

Now it's time to raise PR...!

Steps to Include the Chaos Charts/Experiments into the ChartHub:

  • Send a PR to the litmus-python repo with the modified experiment files, rbac, test deployment & README.
  • Send a PR to the chaos-charts repo with the modified experiment CR, experiment chartserviceversion, rbac, (category-level) chaos chart chartserviceversion & package.yaml (if applicable).
  • Contact us on slack for any queries or doubts.

Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you?

Join the LitmusChaos Community Slack channel by joining the #litmus channel on the Kubernetes (https://slack.k8s.io/) Slack!

References:

31