This is an undergraduate computer science project I had a chance to work on with a classmate of mine: Garry (@garrygrewal)
Preface
This article aims to be a mental walk through of applying ER modeling to SQL databases via a project example. Although I try to include links to every technical term, this is not a comprehensive beginner's guide to ER modeling. However, I did plan on posting a guide to ER modeling so please let me know via comment or DM if my explanations helped you understand ER Models better.
What is HypeTracker?
What is this "hype" and how do I get more of it?
HypeTracker is a data aggregator application which took data from social media platforms such as Reddit and Twitter to display the number of occurrences a sneaker has been mentioned by people over a period of time. This data is valuable because the prices of aftermarket sneakers can be affected by the attention or "hype" of said sneaker at a certain point in time. By collecting this data and displaying it in graphical form, users can make informed decisions about whether to purchase a certain sneaker based on the perceived value through social media attention and comparing it against the price history of the item.
Designing a Database Using ER Modeling
This project was largely data-centric and so we wanted to design the methods in which we are storing and accessing data in order to not run into issues such as incorrect data relationships or duplicate entries. The errors described could have forced us to delete our database and restart from scratch which can be disheartening depending on the progress of the project.
Entities & Attributes
Can I re-roll for more strength?
In HypeTracker, we used entity relationship (ER) modeling as a way to visually describe our data model before implementing it in SQL. We started by listing out our most important entities (real world objects):
Sneakers - the topic focus of this application
Members - the users of our application
Rankings - stores the historical data of mentions / occurrences of a sneaker
Next, we wrote down some attributes (characteristics or information) we had in mind for each entity.
Sneakers
Members
Rankings
name
name
platform
brand
email
mentions
price*
password
date
*** retail price at launch (not price history)
Keep in mind that this is only the initial set of items we thought of and more attributes were added later on. However, this chart gave us a baseline for the most important attributes needed for our application and a simplistic view to refer back to once our data model becomes complicated.
Relationships Between Entities
What are we?
In the next phase, we began to define the relationships between entities through simple scenarios of how the entities interact.
Sneakers and Members
each sneaker may be monitored by one or more member
each member may monitor one or more sneaker
Sneakers and Rankings
each sneaker can have zero or more rankings
each ranking can only contain one sneaker
*** Members and Rankings entities have no relation between them
Entity relationships can be modeled by their cardinality, which adds a numerical representation to their relations. For example, sneakers and members have a many-to-many relationship, because one sneaker can be watched by many members while one member may watch many sneakers. Meanwhile, sneakers and rankings have a one-to-many relationship because one sneaker can have zero or more ranks associated with it, but each ranking can only describe one sneaker.
Translating all of that into symbols using Crow's foot notation for cardinality, this is what our ER diagram looks like at this point.
Weak Entity Sets
Apes strong together
A key point to identify at this point is that the Rankings entity does not exist without at least one Sneaker entity. This creates a different type of relationship where the weak entity (Rankings) has an existence dependency on the stronger entity (Sneakers). We can represent this by changing the relationship into a double diamond, changing the weak entity into a double rectangle, and using two lines between the weak entity and weak relationship.
Many-To-Many Relationships
There's not enough room for all of us in this relationship
Unfortunately we were not finished with this data model yet. Another glaring issue was the cardinality between the Sneakers and Members entities. Many-to-many relationships creates problems in SQL such as how can one members row in the database store many sneakers at the same time? In addition, there are other issues and proposed solutions which you can read about in this article, but the recommended solution is to use an associative entity.
Using an associative entity, we can refactor the relationship between sneakers and members into a new Watchlist entity which keeps track of members and their sneakers.
Finishing Up
That was easy
Now we were basically done! Just add in the foreign key(s) as attributes and underline the primary key(s) and this was the resulting ER diagram:
Here is the SQL file to implement this in MariaDB (v10.1.35):
Here are some example SQL queries to retrieve data for certain scenarios:
/* Get All Sneakers Watched by a Member */SELECTS.Name,S.Price,S.BrandFROMSneakersSINNERJOINWatchlistWONS.Name=W.SneakerNameWHEREW.MemberEmail='$email';/* $email is a PHP variable here *//* Get the 5 Most Mentioned Sneakers In the Last Week */SELECTS.Name,S.Price,S.Brand,RS.TotalMentionsFROMSneakersSINNERJOIN(SELECTR.SneakerName,SUM(Mentions)ASTotalMentionsFROMRankingsRWHERER.Date>DATE_SUB(CURRENT_DATE(),INTERVAL7DAY)GROUPBYR.SneakerNameORDERBYTotalMentionsDESCLIMIT5)RSONS.Name=RS.SneakerName;
Takeaways and Learnings
It's not over yet!
To recap, we designed a relational database based on our requirements using the entity relationship model in order to visualize our database before implementation. I learned how to express data relationships through cardinality and how to refactor many-to-many relationships so that it will work nicely in SQL. Although creating ER diagrams can be tedious, this is an important process to verify our design decisions in order to avoid simple dependency or redundancy issues later on. I am continuously learning more about SQL and this write-up details an iteration of the project after it had been implemented in PHP.
Data Normalization
That being said, we were working with a relatively simple model due to the small number of entities we needed and did not run in more issues which will require more normalization techniques. I've avoided using this term in the post because it is a complicated topic on its own and we were able to achieve a data model in Boyce-Codd Normal Form (BCNF) just by one refactoring step. If you are planning to learn more about databases, I would suggest looking at the different normal forms and normalization techniques, as well as relational algebra and relational calculus to express your SQL queries more effectively.