SQL Database Normalization with Easy Examples
Tommy
Posted on February 18, 2024
In database design, there are three main ways to organize data: first normal form (1NF), second normal form (2NF), and third normal form (3NF). While there are more advanced forms like Boyce and Codd Normal Form (BCNF or 3.5NF) and fourth normal form (4NF), we'll focus on the basics: 1NF, 2NF, and 3NF.
Why Normalization is Important?
Normalization is crucial because it prevents anomalies and eliminates redundant data. Anomalies include Update Anomalies, Insertion Anomalies, and Deletion Anomalies.
Update Anomalies: These occur when updating data inconsistently, leading to discrepancies or errors.
Insertion Anomalies: When adding new data, insertion anomalies arise due to the necessity of adding incomplete information because certain attributes depend on other attributes not present.
Deletion Anomalies: Deleting data can inadvertently remove unrelated information due to dependencies within the database.
First Normal Form (1NF):
In 1NF, each column in a table contains atomic values, meaning that each value is indivisible. For example, in a table storing student information, the Phone Numbers column should not contain multiple phone numbers separated by commas.
Instead, each phone number should be in its own row.
Second Normal Form (2NF):
2NF builds on 1NF by ensuring that every non-key attribute is fully dependent on the primary key. In other words, each column must depend on the entire primary key, not just a part of it. For instance, in a table that stores product order details, if the primary key is OrderID, the Gender column should depend on OrderID.
In this case, the Gender column is not fully dependent on the primary key; hence we remove it. This violates the 2NF because the Gender information is not directly related to the primary key.
Third Normal Form (3NF):
3NF further refines the structure by removing transitive dependencies. This means that non-key attributes should not depend on other non-key attributes. In other words, if columns A, B, and C are in a table R, and A determines B, and B determines C, then C is transitively dependent on A.
In this example, the Category column depends solely on the Product Name column creating a transitive dependency and violating the 3NF. Hence, we move them out to separate tables.
Got something? Feel free to use the comment section.
Posted on February 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.