What is The Difference between 2NF and 3NF?

What is Normalization?

Normalization in a database is the process of organizing the data to reduce redundancy. The main idea is to segment a larger table into smaller ones and connect them through a relation.

But why should an end-user like you or me be concerned about Data Normalization?

To answer that question, we first need to understand what could happen if our data is not normalized.

Why is Database Normalization important?

Let’s assume a company stores all its data, such as employee details, personal information, etc., in a single table. This data is accessible to end-users such as developers, database admins (DBAs), etc. But what happens when multiple end-users interact with the database at the same time?

For instance, imagine a DBA is updating the database, and during the process, a developer, completely unaware, performs another operation on the database. These users now have a different view of the database altogether.

Such data mismatch when multiple users interact with the database simultaneously can be termed as an anomaly.

This is where Data Normalization steps in. It helps in avoiding data inconsistencies and provides a more organized way to store your data. So, whether you’re a database administrator or just an end-user, you need to be aware of how your data is getting stored and what it means for the company.

Now that we have a clear idea of what can happen without Data Normalization, let’s look at the various normal forms available for Database Management (DBMS).

Normal Forms

First Normal Form (1NF)

A relation is said to be in First Normal Form (1NF) if it does not contain any multi-valued or composite attribute. In other words, every single attribute in a table has to hold an atomic value; otherwise, it defies the rules of the First Normal Form.

Let’s look at an example:

First Name Last Name Movies rented
Sam Holland The Notebook, A walk to remember
Joe Dunphy Harry Potter and the Goblet of Fire
Harry Williams Interstellar, Inception, Gravity

The last column in the table - ‘Movies rented’ is holding several values. The 1NF version of this table will look like this:

First Name Last Name Movies rented
Sam Holland The Notebook
Sam Holland A walk to remember
Joe Dunphy Harry Potter and the Goblet of Fire
Harry Williams Interstellar
Harry Williams Inception
Harry Williams Gravity

Now you may wonder, how does Data Normalization reduce redundancy if the above conversion doubled the number of existing rows?

This is because we have taken a trivial scenario where we considered only a single table. In the next section, we will discuss the second normal form, and it’ll make more sense as to how Data Normalization reduces the overall redundancy.

Second Normal Form (2NF)

A relation is in Second Normal Form (2NF) if:

  1. it is in First Normal Form (1NF) and,
  2. it has no partial dependency

If a non-key attribute can be determined from a proper subset of the candidate key, then the relation is said to have a** partial dependency**.

Let’s see an example to understand this better -

Subject Taught Teacher ID Teacher Age
Mathematics 181 37
Social Sciences 11 29
English 181 37
Physics 27 45
Chemistry 27 45

The above table follows 1NF as each attribute holds a single value. However, ‘Teacher Age,’ which is a non-key attribute (as it cannot be used as an identifier - two people can have the same age), is dependent on ‘Teacher ID,’ which is a proper subset of the candidate key. Therefore, this table exhibits partial dependency and does not follow the Second Normal Form.

The 2NF conversion of the adobe table will look like this:

Table 1:

Teacher ID Teacher Age
181 37
11 29
27 45

Table 2:

Subject Taught Teacher ID
Mathematics 181
Social Sciences 11
English 181
Physics 27
Chemistry 27

Now, we no longer need to store Teacher Age every time we add in a new course. This breakdown reduces the overall redundancy when you are dealing with a large number of rows.

Let’s have a look at the next normal form in the chain.

Third Normal Form (3NF)

A relation is in Third Normal Form (3NF) if -

  1. it is in 2NF and,
  2. it has no **transitive-dependency **for all the non-key attributes.

If A->B and B-> C are two functional dependencies, then A->C is a transitive dependency. If a table has such indirect dependencies, then it does not follow the Third Normal Form.

Alternatively, a relation with a functional dependency of A->B is in 3NF if one of these conditions is true -

  1. A is the superkey
  2. B is a prime attribute, i.e., B is a part of the candidate key

Consider the following table -

Employee ID Employee Name Employee State Employee Country Employee ZIP
1267 Sam Holland California USA 421005
4582 Joe Dunphy Texas USA 560051
2362 Harry Williams Florida USA 690087
1260 Alexa Stewart Alaska USA 798423

Primary Key: Employee ID

Non-key attributes: Employee Name, Employee State, Employee Country, Employee ZIP

  1. ‘Employee ZIP’ is dependent on ‘Employee ID.’
  2. Employee State’ and ‘Employee Country’ are dependent on ‘Employee ZIP.’

Thus by the definition of transitive dependency, ‘Employee State’ and ‘Employee Country’ depend on ‘Employee ID.’ This table is, therefore, not in 3NF. We need to break down the table into two, and the final conversion looks like this -

Table 1:

Employee ID Employee Name Employee ZIP
1267 Sam Holland 421005
4582 Joe Dunphy 560051
2362 Harry Williams 690087
1260 Alexa Stewart 798423

Table 2:

Employee ZIP Employee State Employee Country
421005 California USA
560051 Texas USA
690087 Florida USA
798423 Alaska USA

The Difference between Second Normal Form (2NF) and Third Normal Form (3NF)

The overview of the three normal forms tells us one thing for sure - that each normal form is stricter than its predecessor. For instance, in 2NF, non-prime attributes are not dependent on prime (or key) attributes, but a non-prime attribute can depend on another non-prime attribute. 3NF eliminates this possibility as non-prime attributes are only dependent on the super key of the relation.

Moreover, 2NF tackles partial dependency, whereas 3NF focuses on avoiding transitive dependency. With 2NF, we saw that the repeating groups were eliminated from the table, whereas 3NF reduced the redundancy altogether. Thus, 3NF is a stronger normalization form.

A direct comparison between 2NF and 3NF is somewhat misleading as it is not an apples-to-apples comparison. 3NF is a more sophisticated case of 2NF, and thus, it wouldn’t be fair to compare these normal forms. The choice of normalization depends on your data and end goal. If you aim to reduce the main redundant data, choose 2NF. However, if you are looking to ensure referential integrity, 3NF is a better choice.

33