What is The Difference between 2NF and 3NF?

What is Normalization?

Normalization in a database is the process of organizing the data to reduce redundancy. The main idea is to segment a larger table into smaller ones and connect them through a relation.

But why should an end-user like you or me be concerned about Data Normalization?

To answer that question, we first need to understand what could happen if our data is not normalized.

Why is Database Normalization important?

Let’s assume a company stores all its data, such as employee details, personal information, etc., in a single table. This data is accessible to end-users such as developers, database admins (DBAs), etc. But what happens when multiple end-users interact with the database at the same time?

For instance, imagine a DBA is updating the database, and during the process, a developer, completely unaware, performs another operation on the database. These users now have a different view of the database altogether.

Such data mismatch when multiple users interact with the database simultaneously can be termed as an anomaly.

This is where Data Normalization steps in. It helps in avoiding data inconsistencies and provides a more organized way to store your data. So, whether you’re a database administrator or just an end-user, you need to be aware of how your data is getting stored and what it means for the company.

Now that we have a clear idea of what can happen without Data Normalization, let’s look at the various normal forms available for Database Management (DBMS).

Normal Forms

First Normal Form (1NF)

A relation is said to be in First Normal Form (1NF) if it does not contain any multi-valued or composite attribute. In other words, every single attribute in a table has to hold an atomic value; otherwise, it defies the rules of the First Normal Form.

Let’s look at an example:

First Name	Last Name	Movies rented
Sam	Holland	The Notebook, A walk to remember
Joe	Dunphy	Harry Potter and the Goblet of Fire
Harry	Williams	Interstellar, Inception, Gravity

The last column in the table - ‘Movies rented’ is holding several values. The 1NF version of this table will look like this:

First Name	Last Name	Movies rented
Sam	Holland	The Notebook
Sam	Holland	A walk to remember
Joe	Dunphy	Harry Potter and the Goblet of Fire
Harry	Williams	Interstellar
Harry	Williams	Inception
Harry	Williams	Gravity

Now you may wonder, how does Data Normalization reduce redundancy if the above conversion doubled the number of existing rows?

This is because we have taken a trivial scenario where we considered only a single table. In the next section, we will discuss the second normal form, and it’ll make more sense as to how Data Normalization reduces the overall redundancy.

Second Normal Form (2NF)

A relation is in Second Normal Form (2NF) if:

it is in First Normal Form (1NF) and,
it has no partial dependency

If a non-key attribute can be determined from a proper subset of the candidate key, then the relation is said to have a** partial dependency**.

Let’s see an example to understand this better -

Subject Taught	Teacher ID	Teacher Age
Mathematics	181	37
Social Sciences	11	29
English	181	37
Physics	27	45
Chemistry	27	45

The above table follows 1NF as each attribute holds a single value. However, ‘Teacher Age,’ which is a non-key attribute (as it cannot be used as an identifier - two people can have the same age), is dependent on ‘Teacher ID,’ which is a proper subset of the candidate key. Therefore, this table exhibits partial dependency and does not follow the Second Normal Form.

The 2NF conversion of the adobe table will look like this:

Table 1:

Teacher ID	Teacher Age
181	37
11	29
27	45

Table 2:

Subject Taught	Teacher ID
Mathematics	181
Social Sciences	11
English	181
Physics	27
Chemistry	27

Now, we no longer need to store Teacher Age every time we add in a new course. This breakdown reduces the overall redundancy when you are dealing with a large number of rows.

Let’s have a look at the next normal form in the chain.

Third Normal Form (3NF)

A relation is in Third Normal Form (3NF) if -

it is in 2NF and,
it has no **transitive-dependency **for all the non-key attributes.

If A->B and B-> C are two functional dependencies, then A->C is a transitive dependency. If a table has such indirect dependencies, then it does not follow the Third Normal Form.

Alternatively, a relation with a functional dependency of A->B is in 3NF if one of these conditions is true -

A is the superkey
B is a prime attribute, i.e., B is a part of the candidate key

Consider the following table -

Employee ID	Employee Name	Employee State	Employee Country	Employee ZIP
1267	Sam Holland	California	USA	421005
4582	Joe Dunphy	Texas	USA	560051
2362	Harry Williams	Florida	USA	690087
1260	Alexa Stewart	Alaska	USA	798423

Primary Key: Employee ID

Non-key attributes: Employee Name, Employee State, Employee Country, Employee ZIP

‘Employee ZIP’ is dependent on ‘Employee ID.’
Employee State’ and ‘Employee Country’ are dependent on ‘Employee ZIP.’

Thus by the definition of transitive dependency, ‘Employee State’ and ‘Employee Country’ depend on ‘Employee ID.’ This table is, therefore, not in 3NF. We need to break down the table into two, and the final conversion looks like this -

Table 1:

Employee ID	Employee Name	Employee ZIP
1267	Sam Holland	421005
4582	Joe Dunphy	560051
2362	Harry Williams	690087
1260	Alexa Stewart	798423

Table 2:

Employee ZIP	Employee State	Employee Country
421005	California	USA
560051	Texas	USA
690087	Florida	USA
798423	Alaska	USA

The Difference between Second Normal Form (2NF) and Third Normal Form (3NF)

The overview of the three normal forms tells us one thing for sure - that each normal form is stricter than its predecessor. For instance, in 2NF, non-prime attributes are not dependent on prime (or key) attributes, but a non-prime attribute can depend on another non-prime attribute. 3NF eliminates this possibility as non-prime attributes are only dependent on the super key of the relation.

Moreover, 2NF tackles partial dependency, whereas 3NF focuses on avoiding transitive dependency. With 2NF, we saw that the repeating groups were eliminated from the table, whereas 3NF reduced the redundancy altogether. Thus, 3NF is a stronger normalization form.

A direct comparison between 2NF and 3NF is somewhat misleading as it is not an apples-to-apples comparison. 3NF is a more sophisticated case of 2NF, and thus, it wouldn’t be fair to compare these normal forms. The choice of normalization depends on your data and end goal. If you aim to reduce the main redundant data, choose 2NF. However, if you are looking to ensure referential integrity, 3NF is a better choice.

Blog