Database normalization.
Flavio Francisco
Posted on March 20, 2023
Some facts
The words normalization and normal from refer to the structure of a database.
Normalization was developed by IBM researcher E.F. Codd in the 1970s.
Normalization increases clarity in organizing data in databases.
Scenario
Let's say that you are hired to develop a new eCommerce website for some company, than your client shows you the following table:
As you can see the table above has duplicated data and to better organize and optmize our data to be consumed by our new application we do the data modelling called: database normalization.
The 3 most used database normalization are:
First Normal Form (1NF);
Second Normal Form (2NF) and
Third Normal Form (3NF).
Also we have:
Boyce-Codd Normal Form (BCNF);
Fourth Normal Form (4NF) and
Fifth Normal Form (5NF).
I will explain only the 1NF, 2NF and the 3NF because they are the most common used.
The other forms will be explained in different articles.
First Normal Form (1NF)
To be considered a 1NF table:
- Each table must have unique columns;
- Using row order to convey information is not permitted;
- Mixing data types within a primary key is not permitted;
- Having a table without a primary key is not permitted and
- Repeating groups are not permitted.
As you can see the 'Product Types' columns now is one column, but we can improve our data using the 2NF and that I will explain below.
Second Normal Form (2NF)
To be considered normalized in the 2NF our table must be already normalized as 1NF and each field of a table that is not the primary key is determined by that primary key.
But what is a primary key (PK)? Basically is a code the identifies a unique row in a table.
Take a look at the table above now. We have: Id, Client Name, Phone Number, Product Name, Quantity, Value and Product Type. Since the Id is the PK of the client we can apply the 2NF in the following way:
- Does the phone number can be unique for a client? (for simplicity let's say: yes)
- Does a product can be unique for a client? (no)
- Does the quantity of an ordered product can be unique by a client? (no)
- Does the value belongs to the client? (no)
- Does the product type belongs to the client? (no).
Based on the answers above we can group the information in a such way that each data will be represented by an unique primary key.
NOTE: a primary key (PK) is a unique value for each row in a table. The PK is used to identify a unique record (row) in a database.
This value cannot be null and can be either a unique column or a set of column that represents an unique combination in a table.
We may have also the foreign Key (FK) is a column or group of columns in a relational database that represents the data in a linked table. A foreign key is just the reference from an table in another table.
Client
Product
But what does happened with the information of the orders? Now we will apply the third normal form (3NF).
Third Normal Form (3NF)
Every non-key attribute in a table should depend on the key, the whole key, and nothing but the key.
An entity is said to be in the third normal form when, it satisfies the criteria to be in the second normal form (2NF) and there is no transitive functional dependency. What this mean? If the table A dependent on table B and the table B is dependent in on the table C then C is transitively dependent on A and B.
So we need to identify the tables were the non-primary-keys field depend on something other than the primary key.
Our candidate for that is the Product table and the field: Product Type. Why? Because a product can only belong to a specific product type and product type can belong to multiple products and also the relations between clients and orders.
We will than have the following entities to have our database in a 3NF.
Client
Product
Product Type
Order
NOTE: I just decided to keep this database model simple. Of course we can include more columns as: creation time; total value; price before taxes and etc, but for explanation I would like to keep it simple.
To conclude our database model will look like the picture below:
Translating our data model we can say that:
1 client can have either 0 or many orders (0 to N);
1 product can be either in 0 or many orders (0 to N);
1 product can have 1 product type (1 to 1) and
1 product type can be assigned to many products (1 to N).
Normalization of a database offers several advantages, including the elimination of redundant information and the maintenance of data consistency. By avoiding the repetition of data, you can optimize storage space and reduce the likelihood of errors. Additionally, by using a unique identifier like the product's ID (primary key) to refer to data, changes to one piece of information, such as a product name, will not affect other related data. This can help ensure the accuracy and reliability of your database over time.
Well this is my first article on the internet and I hope that you enjoyed it.
Posted on March 20, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024