SQL, NoSQL and beyond

SQL stands for Structure Query Language. You use it to create, update and delete data in a relational database where available data is organized into related tables.

While SQL databases are the most common ones, NoSQL databases can be a great alternative and make more sense for your next project, but be careful.

Disclaimer

It's not an exhaustive guide but an introduction with simple words. I want to focus on using the suitable model according to your project's context.

The relational model

Relational databases contain tables that consist of rows and columns. You can see columns (also called "fields") as generic labels for elements in rows and rows as individual records with values:

id first_name last_name age
1001001 Kobe Bryant 37

We have a database called "NBA" that contains all games, players, and teams. The amount of information is pretty massive.

We need a model that scales, a database where we can store, update and cross-check data. To achieve that, we must structure the information. Otherwise, we'll have many duplicates.

Tables, also called "relations" in the relational model, allow for categorizing data. For example, our NBA database could have five tables: player, team, game, player_stats, and team_stats.

All data in the same table have the same columns, and in records (rows), the data must be normalized. In other words, the data follow the same schema.

Besides, there are relations between tables. In our example, a player belongs to a team, and each game is a sports meeting between two teams. Each player generates various statistics (player_stats) during a game.

People often use the word cardinality to describe those relationships. The most common terms are One-to-One, One-to-Many, and Many-to-Many.

One or several fields can be connected to one or several fields of another table. That is why the relational model is handy to cross data with queries. It's easy to merge data with the same columns.

The NoSQL model

The relational model has been the most popular model for decades, and it's still widely used, but it's not the only approach.

Not all applications need a relational database.

key value
1001001 { "first_name": "Kobe", "last_name": "Bryant", "age": 37 }

N.B.: This key-value database is not the only type of NoSQL system, but that should give you a hint.

The need for horizontal scalability is essential nowadays. Websites and applications rarely need full server capacity all the time. There are peaks.

NoSQL systems allow for distributing the database (~ pool), which is extremely hard with relational systems that usually scale-up (vertically, so you need to increase the server capacity).

Be careful with NoSQL

While modern architectures love NoSQL databases, it does not come without technical challenges and some maintenance issues.

Information is often stored in a "JSON-like" format or as a key-value pair of serialized data, without any pre-defined schema, which is handy.

However, it's sometimes harder to maintain, and it's not uncommon to write more and more complex queries in the successive iterations of the project. It can lead to worse performances, which is unfortunate because you expect high performance and low latency from a NoSQL system.

Besides, it's not uncommon to see non-standard usages, with relational queries in a non-relational system.

So when to use SQL and NoSQL?

As we saw, you don't need to respect a pre-predefined schema with NoSQL. If your application requires many structural changes, so it's a dynamic model, then NoSQL makes sense.

If your priority is Low latency and Availability over Consistency, it makes sense too. They call it "eventual consistency", because it takes time to replicate the data on all servers, and each server returns its copy of the data even if it does not have the latest version.

When you read about SQL and NoSQL, you often stumble upon two acronyms: ACID and BASE. ACID stands for "atomicity, consistency, isolation, and durability". BASE stands for "Basically Available, soft state and eventual consistency".

The BASE approach according to Brewer forfeits the ACID properties of consistency and isolation in favor of "availability, graceful degradation, and performance"

ACID BASE
Strong consistency Weak consistency – stale data OK
Isolation Availability first
Focus on "commit" Best effort
Nested transactions Approximate answers OK
Availability? Aggressive (optimistic)
Conservative (pessimistic) Simpler!
Difficult evolution (e. g. schema) Faster, Easier evolution

Conclusion

Do not believe that "NoSQL is for large scale and SQL is for low scale". That would be a dangerous oversimplification.

SQL does make sense for low-scale projects, but the large scale does not necessarily mean NoSQL, especially when you need to run relational queries. Amazon and other giants of the cloud have solutions for that with low-latency read replicas.

Beyond that, it's not uncommon to combine the two approaches with a relational database for CRUD operations (create, read, update, delete), a NoSQL system as an object cache like Redis, and another NoSQL as a search engine like Elasticsearch.

30