Create Managed Cassandra Database(AWS Keyspaces) using AWS CDK

What is Cassandra?

Apache Cassandra is an open source NoSQL distributed database where it is popular because of its linear scalability, proven fault tolerance and high performance.

Query Language - CQL

Cassandra provides the Cassandra Query Language (CQL), an SQL-like language, to create and update database schema and access data. CQL syntax is much similar to SQL but there are some limitations such as no joins, no aggregations etc.
Learn more about CQL: https://www.guru99.com/cassandra-query-language-cql-insert-update-delete-read-data.html

How Cassandra stores its data?

  • Keyspace: defines how a dataset is replicated, for example in which datacenters and how many copies. Keyspaces contain tables.
  • Table: defines the typed schema for a collection of partitions. Cassandra tables have flexible addition of new columns to tables with zero downtime. Tables contain partitions, which contain partitions, which contain columns.
  • Partition: defines the mandatory part of the primary key all rows in Cassandra must have. All performant queries supply the partition key in the query.
  • Row: contains a collection of columns identified by a unique primary key made up of the partition key and optionally additional clustering keys.
  • Column: A single datum with a type which belong to a row.

Why Amazon Keyspaces introduced?

Amazon Keyspaces is a managed Apache Cassandra–compatible database service. With Amazon Keyspaces, you can run your Cassandra workloads on AWS using the same Cassandra application code and developer tools easily. Its a serverless service. Therefore it eliminates Server provisioning, patching, maintaining burdens and you just pay as you go. Amazon Keyspaces take care of automatically scaling tables up and down in response to application traffic.

More benefits of Amazon Keyspaces...

  • Virtually unlimited throughput and storage
  • Data is encrypted by default
  • Enables you to back up your table data continuously using point-in-time recovery

Why AWS CDK?

AWS CDK helps to provision Cloud infrastructure resources in AWS faster in your favorite language. So, I decided to use Python to provision infrastructure for this tutorial and its super easy.

Open up your favorite IDE and Lets write some code to provision Amazon Keyspace resources

1)create the CDK project in Python language

cdk init app --language python

You will get similar folder structure as shown below.
image

Note:- cdk init uses the name of the project folder to name various elements of the project, including classes, subfolders, and files.

I ran cdk init command in a empty folder called cassandra_cdk

2) Activate virtual environment

#For Linux enviroments
source .venv/bin/activate

.venv folder is your python virtual environment directory

3) Add dependencies to Setup.py and run following command to download all dependencies

python -m pip install -r requirements.txt

4) Define Keyspace and Table resources in the code

5) Make sure AWS Credentials are set.

Options 1:

Set credentials in the AWS credentials profile file on your local system, located at:

~/.aws/credentials on Linux, macOS, or Unix

C:\Users\USERNAME\.aws\credentials on Windows

[default]
aws_access_key_id = your_access_key_id
aws_secret_access_key = your_secret_access_key
Options 2:

Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

To set these variables on Linux, macOS, or Unix, use :

export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key

To set these variables on Windows, use :

set AWS_ACCESS_KEY_ID=your_access_key_id
set AWS_SECRET_ACCESS_KEY=your_secret_access_key
Options 3:

For an EC2 instance, specify an IAM role and then give your EC2 instance access to that role.

6) Bootstrapping - provisioning these initial resources such as S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.

cdk bootstrap

7) Generate CloudFormation Templates

cdk synth
image

8) To list the defined stacks in the code

cdk list
image

9) Provision AWS resources

cdk deploy
image

You will notice in AWS management console that Keyspace is successfully provisioned and table is created which we previously defined in CDK code.

10) Setup Cassandra client/Run CQL queries to test it.

Option 1: Use CQL editor in AWS Management console
Option 2: Use cqlsh

cqlsh is usually bundled with Cassandra. To make things easier, i will use cassandra:3.11.7 docker image to run cql queries through cqlsh utility.

Pull the docker image
docker pull cassandra:3.11.7
Create a container from the image and start bash,so you will be able to run commands in it.
docker run -it <Image id> /bin/bash

Amazon Keyspaces only accepts secure connections using Transport Layer Security (TLS).Therefore to connect using SSL/TLS,
Download the Starfield digital certificate:
curl https://certs.secureserver.net/repository/sf-class2-root.crt -O
Note down the cerfile path.

export SSL_CERTFILE environment variable.
export SSL_CERTFILE=<cert file path>
image

Run following command to connect to Cassandra database.

Generate AWS Keyspace credentials for your IAM User:
Follow this guide to generate it.
How to Generate AWS Keyspace credentials for your IAM User

From this step, you will obtain username and password for Keyspace.

List of service endpoints for Keyspace Available here.Choose the correct Service Endpoint based on the region.
List of service endpoints for Keyspace

cqlsh <keyspace service endpoint> 9142 -u "<generated-keyspace-useranme>" -p "<generated-keyspace-password>" --ssl

After you execute this command, you will prompt to cqlsh command line shell to execute query on Keyspace(Cassandra) Database.

To destroy the resources, You can run..

cdk destroy

23