Relational data models

What

The relational data model was invented in 1969 at the IBM research department, with the lead researcher Edgar R. Codd. They created the 12 rules (13) to manifest the requirements for a data management system.

Rule 1: The information rule:
All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values in tables.

Why

Relational data models are important as they help us with

  • standardisation of data models
  • flexibility in adding and altering tables
  • data integrity
  • structured query language can be used
  • simplicity - the tabular format
  • intuitive organisation - spreadsheets

When

  • small data volumes
  • secondary indices needed
  • Analytics needed
  • Aggregations needed
  • ACID transactions needed
  • Flexibility for writing queries: In relational data models we are modelling data and not queries.
  • easy to change, when business requirements change

Structure

Why

  • free database from unwanted insertions, updates and deletion dependencies
  • reduce the need of refactoring the database as new types of data are introduced
  • make the data more informative for users
  • neutral to quer statistics. We don't want to design for particular queries

Normalisation is needed to reduce data redundancy and to increase data integrity. The overall goal here is to have updates only in one place.

1NF - First Normal Form

  • Atomic values: each cell contains unique and single values
  • Be able to add data without altering tables
  • Separate different relations into different tables
  • Keep relations between tables together with foreign keys

2NF - Second Normal Form

  • 1NF is reached
  • All columns in a table rely on the primary key

3NF - Third Normal Form

  • 2NF is reached
  • No transitive dependencies

There are up to 6 normal forms. But most of the time we will need only up to 3.

When everything is normalised, we need Joins to gather the data. The more Joins you need, the slower the query will be. Denormalisation must be done to increase performance.

Tables

Fact table

Consists of the measurements, metrics of facts of business process. EVENTS THAT HAVE ACTUALLY HAPPENED - like a transaction

Dimension table

A structure that categorises facts and measures in order to enable users to answer business questions.
Dimensions can be for example people, products, place or time.

Schemas

Star Schema

  • a physical model resembling a star shape
  • a fact table is at its center, surrounded by dimension tables
  • the simplified version of the snowflake scheme

Advantages

denormalised | simplifies queries | fast aggregations

Disadvantages

downsides of denormalisation | data integrity | decrease query flexibility | many to many relationship |

Snowflake Schema

logical arrangement of tables in a multidimensional database represented by centralised fact tables with are connected to multiple dimension.

Advantages

In the 1NF and 2NF it is more normalised than the STAR Schema

Sketchnote

To have all the above information in one view, I made a sketchnote.

If you need a higher resolution please use this page

Useful links

17