Waleed Abdullah
Posted on February 22, 2024
I wanted to apply the Feynman technique in teaching other people about the normalization concept in databases.
Normalization is the process of converting your databases in such a way that it reduces data redundancy, prevents anomalies in the data, and ensures data integrity.
Normalization is needed for:
- Storage optimization
- Data integrity by eliminating redundancy
- Storage Optimization
- Keeping consistent by using foreign keys.
We will first go over some definitions that we need to know to understand the concept of normalization.
Anomalies
Anomalies are inconsistencies in the data. These happen when the data is redundant. There are 3 types of anomalies that can occur in relational databases.
-
Insertion Anomaly: Say, you have 2 tables one is
Student(StudentID, StudentName, TeacherID, TeacherName)
and the other isTeacher(TeacherID, TeacherName, StudentID, StudentName)
, If you insert data in one table, you will also have to insert it in the other, if you add a new record in the Student table, you will have to add it in the teacher table as well. The studentName and teacherName are repeated and are redundant information in this case. These are insertion anomalies. - Update anomaly: Let's take the previous example on the Teacher and Student table, if we update the student name of a student in the Student table, we will have to update the same in the Teacher table as well, this is an update anomaly as the student's name does not need to be in the teacher table.
- Deletion anomaly: Now imagine that there was only one table keeping all this information, if we want to delete a student from the table and the teacher only had one student which was that student which we want to delete, we would lose the data of the teacher as well this is a deletion anomaly. Normalization helps us remove this redundancy from the data, by using foreign keys and referencing them in other tables.
Primary Key
It is a unique identifier for the records in a database.
For example, an "email" could be a primary key, or an "ID" is the most common primary key used.
Candidate Key
A minimal set of attributes that can be used to uniquely identify records in a table.
For example, In an "Employees" table, both "Employee ID" and "Social Security Number" can serve as candidate keys, as either could uniquely identify an employee.
Super key
A set of attributes that can be used together to uniquely identify a record in a table.
For example, in a sales table "CustomerID" and "ProductID" can be used to uniquely identify a record together.
Normalization
I will discuss the fundamental normal forms which are 1NF (first normal form), 2NF, 3NF, and BCNF (Boyce-Codd Normal Form).
1NF (First Normal Form)
The first normal form states that every field in the database should have atomic (singular) values.
For example, we want to store the phone numbers of a customer, If a customer has 2 phone numbers then we will insert another record for that customer instead of putting the 2 phone numbers in the same field.
The DBMS initializes databases in a way that they always use atomic values.
2NF (Second Normal Form)
For a table to be in the second normal form:
- It should be in 1NF.
- It should not have any partial dependencies. A partial dependency occurs when we have a field in a database such that it is dependent on a proper subset of the primary key. To remove this dependency we create a separate table or move the data to a table where it is dependent on the entire primary key.
3NF (Third Normal Form)
For a table to be in the third normal form:
- It should be in 1NF and 2NF.
- It should not have any transitive dependencies. A transitive dependency occurs when we have a table (A, B, C) with A as a primary key and, A -> B,C and B -> C, in other words, if a column can be determined by another column that is not part of the primary key, then we have a transitive dependency. To resolve this we create a separate table for that attribute.
BCNF (Boyce-Codd Normal Form)
Boyce-Codd normal form is a higher form of 3NF and it states that for a table to be in BCNF:
- The table should be in 3NF
- For all functional dependencies
X -> Y
X is a super key
Usually, we don't need to convert to BCNF and 3NF is enough to ensure optimal storage and remove all the anomalies.
There are also 2 other normal forms 4NF and 5NF, but I won't be covering them here.
Thank you.
Did you know ? every relation with two attributes is always in BCNF
Posted on February 22, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.