30
SQL, NoSQL and beyond
SQL stands for Structure Query Language. You use it to create, update and delete data in a relational database where available data is organized into related tables.
While SQL databases are the most common ones, NoSQL databases can be a great alternative and make more sense for your next project, but be careful.
It's not an exhaustive guide but an introduction with simple words. I want to focus on using the suitable model according to your project's context.
Relational databases contain tables that consist of rows and columns. You can see columns (also called "fields") as generic labels for elements in rows and rows as individual records with values:
id | first_name | last_name | age |
---|---|---|---|
1001001 | Kobe | Bryant | 37 |
We have a database called "NBA" that contains all games, players, and teams. The amount of information is pretty massive.
We need a model that scales, a database where we can store, update and cross-check data. To achieve that, we must structure the information. Otherwise, we'll have many duplicates.
Tables, also called "relations" in the relational model, allow for categorizing data. For example, our NBA database could have five tables: player
, team
, game
, player_stats
, and team_stats
.
All data in the same table have the same columns, and in records (rows), the data must be normalized. In other words, the data follow the same schema.
Besides, there are relations between tables. In our example, a player belongs to a team, and each game is a sports meeting between two teams. Each player generates various statistics (player_stats
) during a game.
People often use the word cardinality to describe those relationships. The most common terms are One-to-One, One-to-Many, and Many-to-Many.
One or several fields can be connected to one or several fields of another table. That is why the relational model is handy to cross data with queries. It's easy to merge data with the same columns.
The relational model has been the most popular model for decades, and it's still widely used, but it's not the only approach.
Not all applications need a relational database.
key | value |
---|---|
1001001 | { "first_name": "Kobe", "last_name": "Bryant", "age": 37 } |
N.B.: This key-value database is not the only type of NoSQL system, but that should give you a hint.
The need for horizontal scalability is essential nowadays. Websites and applications rarely need full server capacity all the time. There are peaks.
NoSQL systems allow for distributing the database (~ pool), which is extremely hard with relational systems that usually scale-up (vertically, so you need to increase the server capacity).
While modern architectures love NoSQL databases, it does not come without technical challenges and some maintenance issues.
Information is often stored in a "JSON-like" format or as a key-value pair of serialized data, without any pre-defined schema, which is handy.
However, it's sometimes harder to maintain, and it's not uncommon to write more and more complex queries in the successive iterations of the project. It can lead to worse performances, which is unfortunate because you expect high performance and low latency from a NoSQL system.
Besides, it's not uncommon to see non-standard usages, with relational queries in a non-relational system.
As we saw, you don't need to respect a pre-predefined schema with NoSQL. If your application requires many structural changes, so it's a dynamic model, then NoSQL makes sense.
If your priority is Low latency and Availability over Consistency, it makes sense too. They call it "eventual consistency", because it takes time to replicate the data on all servers, and each server returns its copy of the data even if it does not have the latest version.
When you read about SQL and NoSQL, you often stumble upon two acronyms: ACID and BASE. ACID stands for "atomicity, consistency, isolation, and durability". BASE stands for "Basically Available, soft state and eventual consistency".
The BASE approach according to Brewer forfeits the ACID properties of consistency and isolation in favor of "availability, graceful degradation, and performance"
ACID | BASE |
---|---|
Strong consistency | Weak consistency – stale data OK |
Isolation | Availability first |
Focus on "commit" | Best effort |
Nested transactions | Approximate answers OK |
Availability? | Aggressive (optimistic) |
Conservative (pessimistic) | Simpler! |
Difficult evolution (e. g. schema) | Faster, Easier evolution |
Do not believe that "NoSQL is for large scale and SQL is for low scale". That would be a dangerous oversimplification.
SQL does make sense for low-scale projects, but the large scale does not necessarily mean NoSQL, especially when you need to run relational queries. Amazon and other giants of the cloud have solutions for that with low-latency read replicas.
Beyond that, it's not uncommon to combine the two approaches with a relational database for CRUD operations (create, read, update, delete), a NoSQL system as an object cache like Redis, and another NoSQL as a search engine like Elasticsearch.
Photo by Possessed Photography on Unsplash
30