Database Normalization: A Simple Tale of Ignoring 1NF
Tran Manh Hung
Posted on October 16, 2023
Let's dive into a quick chat about database normalization. If you've heard of it, awesome. If not, do not worry. This little story may shed light on why it's important.
Database Normalization in a Nutshell
Picture this: you're trying to organize a big box of mixed-up Lego bricks. You'd probably sort them by color, size, or type, right? Database normalization is like that but for data! It's a way to sort and store your data neatly to avoid mix-ups and make sure everything is easy to find. For those who fancy a textbook definition, Wikipedia tells us:
Database normalization is organizing a relational database to reduce data redundancy and improve data integrity.
In simpler terms: it's a set of rules (or "normal forms") that helps keep your data organized and your database running smoothly. However, the smoother your data, the trickier it might be to pull specific pieces when you need them.
Bypassing 1NF: A Shortcut Gone Wrong
What this article will be a real story I have encountered in the past with one of the projects, where we tried to be clever and decided to ignore the first normal form (1NF) to "simplify" our database.
Just to clarify, 1NF says:
Every field should contain a single value, no sets of values or nested records allowed.
In other words: "Don't put tables inside of tables." But, of course, we did just that—tucked a table into another and found ourselves in a mess of complications and headaches.
The story
Our journey commenced with the seemingly straightforward task of crafting a mechanism to store varied types of content – encompassing charts, custom HTML content, and checklists. The initial thought process nudged us toward the creation of a single table, structured as follows:
create table [dbo].[ContentSelection]
(
[ContentPartSelectionID] int identity (1, 1) not null,
[ConfigurationJSON] nvarchar(max) null,
[ProfileTabID] int not null,
[ContentType] int not null,
);
A singular majestic table, ContentSelection
, was meant to graciously host various content types - charts, HTML blobs, or checklists, all neatly serialized and tucked into the ConfigurationJSON
column. Oh, and a polite ContentType
column to nudge us about what was packed inside each JSON, making deserialization ez thanks to .NET wizards like System.Text.Json
and Newtonsoft
.
Problem 1: Navigating the Puzzle of Configuration Updates
Oh, the joys of updating configurations! Especially when dealing with JSON, it turned out to be a notable quest in our database management journey. Even seemingly simple tasks, like tweaking a chart's Title
attribute, uncovered some pretty significant hurdles embedded in this method.
Simple Title Update in code: A Walk in the Park
In the world of coding, this task seemed like a walk in the park. The steps of deserialization, executing the update, and saving the JSON again could be done in a few simple steps:
var data = JsonConvert.DeserializeObject<ChartConfiguration>(chart.ConfigurationJSON);
data.Title = "New Title";
chart.ConfigurationJSON = JsonConvert.SerializeObject(data);
// And the chart is saved!
This snippet of C# code allows developers to change the Title
without diving deep into the entire JSON string. However, as the application goes, the code was all over the place, and this deserialization and serialization were done in many places. This is a code smell; we should have done it in one place. Sometimes, we just took the whole string and saved it as it is. I just trusted(hoped) that the client is sending the configuration in the correct state. Yeah, but that's a different story.
Simple but Not So Simple: The Database Update
Trying to update the configuration JSON directly within the database manually gave us a few options, each with its own set of complex puzzles:
- Whole JSON String Update:
UPDATE [dbo].[ContentSelection]
SET [ConfigurationJSON] = '{"Title": "New Title", "Type": "Chart", ...}'
WHERE [ContentPartSelectionID] = 1
Surprisingly, updating the entire JSON string, while seemingly straightforward, demands a meticulous understanding of the existing data structure to prevent any accidental data adventures (or misadventures).
- Making Friends with SQL Server’s JSON_MODIFY Function:
UPDATE [dbo].[ContentSelection]
SET [ConfigurationJSON] = JSON_MODIFY([ConfigurationJSON], '$.Title', 'New Title')
WHERE [ContentPartSelectionID] = 1
The JSON_MODIFY
function seemed like a neat and tidy update approach. Yet, it introduced a new set of complexities, particularly for those developers who might not be best buddies with SQL JSON manipulation functions.
HTML Property Challenges: An Additional Twist
And then, things got particularly interesting when dealing with properties that encapsulate entire HTML strings. Direct updates weren't quite in the cards due to the necessity of escaping quotes to keep our JSON string in good shape. Thus, what seems like a straightforward update morphs into a careful dance of ensuring every character is correctly escaped and validated, amplifying the possibilities for error and inefficiency in database management.
Problem 2: Leaping Over Initial Data Insertion Hurdles
Inaugurating a new production instance entailed inserting a wide variety of content—charts, checklists, and the like—into the ContentSelection
table. What seemed like a straightforward task was, in reality, a hidden obstacle course.
The balance between Quantity vs. Quality
Inserting many rows into the ContentSelection
table meant wrestling with large strings of JSON configurations. Each piece of content, whether a chart or checklist, demanded a unique, error-free JSON string to guarantee smooth sailing in functionality and display.
Example JSON:
{
"Title": "Sample Chart",
"Type": "Chart",
"Configuration": {
"Type": "Bar",
"Data": {
"Labels": ["January", "February", "March"],
"Datasets": [{"Label": "Monthly Sales", "Data": [30, 25, 45]}]
}
}
}
Even the tiniest misstep, like a missing comma or mislabeled key, could introduce:
- Data Integrity: A chance of mischievous JSON disrupting data retrieval or manipulation.
- User Experience: The risk of misconfigurations causing a little chaos in content display or UI interactions.
- Developer Efficiency: The potential time-drain of debugging and correcting JSON strings.
The Sneaky "What-If" of Mistakes
While seemingly trivial, an error in the JSON string can secretly trigger a chain reaction of operational disruptions, impacting developers and end-users alike. Though subtle, this issue beautifully underscores the crucial nature of precise data management and hints at the complex challenges that emerge when diverting from tried-and-true database normalization practices.
Problem 3: Data Retrieval
While seemingly simple, extracting data from the ContentSelection
table unveiled its complexity in deserializing JSON strings. This complexity, particularly when handling larger objects like comprehensive dashboards requiring multiple content elements, swells even more in cloud environments like Azure, where every computational second counts—both in performance and pennies.
Problem 3.5 Search Operations: Not So Light on Serialized Columns
Embarking on search operations within serialized columns is not just a stroll in the tech park. Here we are, navigating through JSON strings, demanding not just a splash, but a veritable ocean of additional computational prowess—especially when side-eyed with the columnar searches in our ever-so-organized normalized database setup. And as those JSON objects swell in size and complexity, our search operations mutate, becoming an increasingly obstinate beast, voraciously consuming resources and casually nudging our end-users’ response times into the slow lane.
And nvarchar(max), oh what a whimsical challenge for standard SQL server full search – a perfect storm of complexity all bundled up with a neat little bow. Thus, the serialized columns sashay through, daring us to keep pace without tripping over the nuances of adept database management. May the odds be ever in our favor!
Problem 4: The Need for a Schema Update
We had multiple model classes within the source code with this comment.
// Careful with renaming any of the properties here!!! - will break existing content parts
That's it. That explains it all.
The Solution: A Nod to 1NF
To alleviate the problems mentioned above, we pivoted towards adhering to the First Normal Form (1NF). It entailed restructuring our database, primarily by creating distinct tables for each content type, thus eliminating the need for JSON serialization within the database.
-- Original table, omitting the JSON column
create table [dbo].[ContentSelection]
(
[ContentPartSelectionID] int identity (1, 1) not null,
[ProfileTabID] int not null,
[ContentType] int not null,
);
-- Additional tables
create table [dbo].[ChartConfiguration]
(
[ContentPartSelectionID] int not null,
[Title] nvarchar(max) null,
[ChartType] int not null, -- e.g., 1 = Bar, 2 = Line, 3 = Pie
[Values] nvarchar(max) not null,
[XName] nvarchar(100) not null,
[YName] nvarchar(100) not null
);
create table [dbo].[CustomContentConfiguration]
(
[ChecklistConfigurationID] int identity (1, 1) not null,
[ContentPartSelectionID] int not null,
[Title] nvarchar(max) null,
[Html] nvarchar(max) null,
);
... -- you get the idea
While effectively eliminating the need for extensive JSON operations within SQL, this solution required a meticulous migration process, ensuring existing customer data was transferred accurately and securely to the new structure.
Navigating Through Data Migration
The transition demanded the crafting of migration scripts for each existing customer instance.
insert into ChartConfiguration(ContentPartSelectionID, Title, Values, Type, XName, YName)
select
ContentPartSelectionID,
Json_Value(ConfigurationJSON, '$.title'),
Json_Value(ConfigurationJSON, '$.values'),
Json_Value(ConfigurationJSON, '$.ChartType'),
Json_Value(ConfigurationJSON, '$.xName'),
Json_Value(ConfigurationJSON, '$.yName')
from ContentSelection
where ContentType = 1
Interestingly, this migration was multi-faceted and far from straightforward. For each content type, specific scripts were required. Moreover, each migration script must include checks to ensure data integrity, particularly validating JSON strings and managing instances where the Json_Value
function misbehaved without apparent reason.
To safeguard against potential mishaps during the migration, the process was executed in two strategic phases:
- Data Migration: Transfer data without altering the original structure.
- Schema Update: Only after verifying the migrated data did we modify the schema, like deleting the JSON column.
This dual-phase approach provided a safety net, ensuring that if complications arose at any point, a revert to the previous database state was always within reach. While resource-intensive, this strategy underscored the importance of meticulous planning and safeguarding data during migration, especially when restructuring to adhere to normalization principles like 1NF.
Conclusion
This story concludes with a happy ending. We successfully migrated all our customers to the new structure and can now update the configuration of each content type without any problems. However, this story is not about the happy ending. It's about the issues we faced and how we solved them.
Take this story as a way to remind you to follow the best practices as much as possible. Even if you think you can do it better.
One last note
Yes, I called them problems, not "challenges" or "opportunities". Because they were problems.
Posted on October 16, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024