27
What is Azure Cosmos DB?
The modern application today requires a robust database system that is highly available, responsive. It also needs to have the ability to adapt to the changes as the demand increases and the application needs to scale in real-time. To successfully deliver those functionalities, organizations tend to use the datacenter close to their region to host those applications. Luckily, there is an Azure resource that can help you achieve those goals with more than just the functionality you need.
In this article, I'll cover the high-level introduction of Cosmos DB and some of the features that it offers. I might do a series for this topic as I did for Azure App Service where I published new things about app service every week, now reduced to bi-weekly. If you here for the first time, be sure to check back next week for more exciting stuff.
Azure Cosmos DB is Microsoft's NoSQLmulti-model managed database as a service offering designed to be highly available and elastic. With the vast option of configuration like consistency level, language support, API support, and flexibility, it really shines when it comes to designing the data in a form other than a relational database. You get a really comprehensive SLA for some of its core features like latency, throughput, availability, and consistency. If you are not familiar with the term 'managed', Microsoft handles all the updates, patching, and management for you so that you can focus on the actual task of leveraging the features and getting the best out of them.
You can use Cosmos DB at no charge for your first 400 RU/s and 5 GB of storage free.
Whenever it comes to planning a database solution for an application, organizations tend to look for features like responsiveness, availability, low latency, and response time to deliver an optimal experience for the end-users. If you are somewhat familiar with the Azure environment, you might be familiar with MSSQL as a managed database instance which in most cases works for the organization, but there are cases where you need to adjust the settings to your need and leverage some additional capabilities that are offered by CosmosDB.
As with any other Azure resource that needs an endpoint address, you get an azure assigned DNS name for your CosmosDB that you can use to communicate to the database. The Cosmos DB design is pretty self-explanatory once you play around with the resource and test it yourself. Let's take a look at what Cosmos DB database design looks like.
Cosmos DB aka 'database account' is referred to as the Cosmos DB resource in Azure itself which holds one or more databases. The database is the unit of management for a set of a cosmos container. You can have multiple databases in an account based on your requirements. There are multiple options for API to choose from for your databases like SQL API, Cassandra, MongoDB, Table API, and Gremlin. CosmosDB handles the different translation and communication for you so that you have a similar experience while working with all types of databases.
CosmosDB containers are the units inside the database that govern scalability for both throughput and storage. Containers are horizontally partitioned and then replicated across multiple regions. The items inside a container are automatically distributed based on the partition key.
Item is a unique entry inside a container that holds the data as a document, node, or row. Based on the API used, the items can represent either a document in a collection, a row in a table, or a node/edge in a graph.
Let's look at some common terminology used in the Cosmos DB to get more familiar with the resource.
The cost of database operation is expressed in Request Units (RU). RU is a performance currency for the operations utilizing CPU/ Memory or IOPS. E.g. the cost to read 1 KB file or fetching a single DB item is 1 RU. All operations are assigned cost in RUs. Some of the operations that are charged in RUs are Query, Read, Insert, Upsert, and Delete.
Partition Keys determines the item's logical distribution. Each item in a container has a unique item ID with logical partitions. It's worth noting that you can't change the partition key once the container is created. You would have to delete and recreate the container to change it.
Cosmos DB support two types of partitions, Logical and Physical Partition.
Throughput defines a number of requests that a cosmos account can support. Currently, Cosmos DB supports two types of throughput modes, manual and auto-scale. If your resource consumes more throughput than allocated, you might face rate limitations You can either increase the throughput manually or set it to autoscale to accommodate the changing demands of the application.
CosmosDB supports a wide variety of API support with more coming soon. It's important to note here that you need to decide what kind of API you wish to use for your database model before you deploy the resource in Azure. Currently available options are Table, SQL, Cassandra, Gremlin, and MongoDB with more coming in the near future.
With the increasing demand for applications to be highly responsive and always online, the instances need to be accessible with low latency. Cosmos DB supports real-time read and writes from the local replicas of the instance.
Cosmos DB provides high availability in two ways. The cosmos instance is replicated in the regions of your choice with 4 additional replicas per region. You can update the replication at any time using Azure Portal. It's worth mentioning that each replica can accept read and write for data usage.
Databases that use replicas for high-availability, low latency or both come with some considerations around consistency on how data is served and suffer minor tradeoff between read-consistency, latency, availability, and throughput. Most popular database service only offers strong and eventual consistency, however, Cosmos DB offers five (5) different consistency models ranging from strongest to weakest as listed below:
- Strong
- Bounded staleness
- Session
- Consistent prefix
- Eventual
Each level provides availability and performance tradeoffs. The strong consistency may suffer from reduced availability whereas Eventual consistency offers high availability and better performance but the data may not be completely consistent across all regions at the same time.
I hope that gives you a high-level introduction to CosmosDB and what it entails and some terminology used in the resource. Use Cosmos DB if your application needs to be highly available, have fast response time anywhere in the world, be always online, and need unlimited and elastic scalability of throughput.
If you get some value out of this article, follow me on this Cloud Journey on Twitter.
27