Ways of Seeing: ORMS & SQL Views
Anna Rankin
Posted on November 22, 2018
This article assumes a general understanding of how databases, Ruby, and Rails work - the examples below use the ActiveRecord library to map database relations to Ruby objects in the context of a Rails application. Check out the example repository to see them in action! (Note: I have also set up a JavaScript example here if you're interested, but I haven't found a JS ORM that has first-class support for SQL views.)
Aggregation Aggravation
Web applications often display "summaries," or aggregated data, that may pull in information from and make calculations across several database tables. ORM (Object-Relational mapping) database modeling libraries are generally designed to read from one table at a time. While this pattern is useful for tons of use cases (basic CRUD for the win), we run the risk of complex application code and/or expensive DB querying when we try to aggregate data. SQL views can potentially help us cut down on the number of queries we make while pushing our data aggregation logic down into the database.
For example - imagine you've been hired to create an application that helps veterinarians communicate with their clients. A workflow like this (pared-down) example this might be familiar to you if you've created a server endpoint:
user = User.includes(:pets).find(1)
render json: {
user: {
id: user.id,
name: "#{user.first_name} #{user.last_name}"
},
pets: user.pets.map do |pet|
{
id: pet.id,
name: pet.name
}
end
}
The code above is pretty straightforward: it loads up a user, loads up their pets, and serializes the data in the format the front end expects. We're making two queries, but they're simple and (hopefully) fast — that's a reasonable expectation.
Fast forward a few weeks, and we've got a new requirement. Users can see which pets they have registered with you, but they also want to know "at-a-glance" when their next appointment is! So, you update the endpoint:
user = User
.includes(:pets)
.find(1)
# Pre-load the user's pets' appointments that are scheduled for the future
upcoming_appointments = user
.appointments
.order(date: :asc)
.where('date >= ?', Time.current)
render json: {
user: {
id: user.id,
name: "#{user.first_name} #{user.last_name}"
},
pets: user.pets.map do |pet|
# Use Array#find to return the first of this pet's appointments
next_appointment_date = upcoming_appointments.find do |appt|
appt.pet_id == pet.id
end
{
id: pet.id,
name: pet.name,
next_appointment_date: next_appointment_date
}
end
}
Honestly, this still isn't too bad. We've added another query — this time for appointments — with some ordering and filtering logic. We're also adding in some looping logic in our serialization step to pull out the first appointment for each of the owner's pets. It's a bit untidy, but hey, it works.
Personally, I don't like encoding all of this behavior in the application code. It feels messy to me, and it's far too easy for something nefarious to sneak in. For example, imagine we'd done this instead:
render json: {
user: {
id: user.id,
name: "#{user.first_name} #{user.last_name}"
},
pets: user.pets.map do |pet|
next_appointment_date = user
.appointments
.order(date: :asc)
.find_by('date >= ?', Time.current)
&.date
{
id: pet.id,
name: pet.name,
next_appointment_date: next_appointment_date
}
end
}
🙀Oh no! Someone snuck a query into our serialization step, and while this might not ring any alarm bells at first (when you're dealing with one or two pets per user), what happens when the local animal shelter becomes one of your clients and registers over a hundred pets? Over a hundred queries per page load.
This is obviously a contrived example, but I've run into more complex cases where N+1 queries were hidden away in service objects and separate files. I personally like to push this kind of logic down a level — into the database - when it starts getting more complicated or when I need it elsewhere in my application. That's where SQL views come in!
What is a SQL view?
My mental model of a SQL view at its most basic is "a saved query in your database that acts like a table." There's a lot more to it than that, but this simple understanding can get you a long way. For example, if I executed the following statement in my database:
CREATE VIEW silly_users AS
SELECT
id,
first_name,
first_name || ' Mc' || first_name || 'erson' AS silly_name
FROM users;
More on the ||
concatenation operator
I can query results from this view using the same syntax I would for a table:
# SELECT * FROM silly_users;
id | first_name | silly_name
-------+------------+------------------------
1 | Melissa | Melissa McMelissaerson
2 | Colleen | Colleen McColleenerson
3 | Vince | Vince McVinceerson
4 | David | David McDaviderson
5 | Dennis | Dennis McDenniserson
...etc
Because this view acts pretty much like a table, it can play really nicely with an ORM. We can create a view-backed model!
silly_user = SillyUser.find(1)
silly_user.silly_name // => 'Melissa McMelissaerson'
If you're using Ruby, I highly recommend the Scenic gem by ThoughtBot, which brings view versioning and other helpful features to projects that use the ActiveRecord ORM.
A New Perspective
Let's try writing a database view to grab the data we want instead of querying for it in our application code (more here on using DISTINCT ON
with GROUP BY
):
-- Grabs one record per pet, returning the earliest future appointment date
CREATE VIEW pets_with_upcoming_appointments AS
SELECT DISTINCT ON (pets.id)
pets.id AS id,
users.id AS user_id,
pets.name AS name,
MIN(appointments.date) AS next_appointment_date
FROM users
INNER JOIN pets
ON user_id = users.id
LEFT JOIN appointments
ON pets.id = pet_id
AND appointments.date >= CURRENT_DATE
GROUP BY (
users.id,
pets.id,
pets.name
);
Great! Now we can read from this view:
# SELECT * FROM pets_with_upcoming_appointments;
user_id | pet_id | pet_name | next_appointment_date
---------+--------+----------+-----------------------
1 | 1 | Flannery | 2018-11-22 13:00:00
2 | 2 | Porkchop | 2018-11-22 16:30:00
2 | 3 | Gravy | 2018-12-01 09:00:00
3 | 4 | Magnus |
4 | 5 | Huey | 2018-12-15 10:45:00
4 | 6 | Duey | 2018-12-15 10:45:00
4 | 7 | Louie | 2018-12-15 10:45:00
# SELECT * FROM pets_with_upcoming_appointments WHERE user_id = 1;
user_id | pet_id | pet_name | next_appointment_date
---------+--------+----------+-----------------------
1 | 1 | Flannery | 2018-11-22 13:00:00
Now that we've got the view set up, we could create a migration using the Scenic gem mentioned earlier, then hook it up to a database-backed ORM model:
class PetWithUpcomingAppointment < ActiveRecord::Base
self.table_name = 'pets_with_upcoming_appointments'
end
Since our view has a user_id
field, it's trivial to hook up a relation in our User model:
class User < ActiveRecord::Base
has_many :pets
# wooooot
has_many :pet_with_upcoming_appointments
has_many :appointments, through: :pets
end
Now, we can clean up our data-fetching application code:
user = User
.includes(:pet_with_upcoming_appointments)
.find(params[:id])
render json: {
user: {
id: user.id,
name: "#{user.first_name} #{user.last_name}"
},
pets: user.pet_with_upcoming_appointments.map do |pet|
{
id: pet.id,
name: pet.name,
next_appointment_date: pet.next_appointment_date
}
end
}
Not bad! We're back down to two queries and no ordering/date comparison logic in the controller 💪 Even better, we can re-use this model in other parts of our application without duplicating the querying logic. This really starts to shine when you start performing more complex aggregation and pulling in data from different tables.
Pitfalls of SQL views
While I'm clearly a fan, there are some things you should watch out for when you start thinking about views in your application:
Overzealous viewification
This is another facet of the "I have a hammer, everything's a nail!" problem. When you first start playing around with them, you can find yourself pushing too much logic down into the database (where it's abstracted away and harder to find). You wind up having to run a lot of migrations because every time you need to change what your application serves up, you need to update or create a new view. 😬 Before you turn whatever you're working on into a view, ask yourself if the benefit you're gaining is worth the trade-off.
Too many levels of cascading views
Views are generally made up of results from tables, but you can also create a view that pulls in data from another view (A META VIEW!?). This can seem like a good idea ("I'll define what an 'active' user is in this view and then pull that in everywhere I need a user!"), but in practice, I've seen it make updating views at all levels more difficult. You can also drive yourself up a wall trying to figure out how a change in one view can affect another one that pulls in data from several levels away 😵 This is one that's really only a problem in tandem with the first point.
Propagated inefficiency
If you're familiar with profiling your SQL queries and using EXPLAIN
/EXPLAIN ANALYZE
to find issues (or if that LEFT JOIN appointments
in the view above made you squirm 😂), this is a good time to put those skills to use! If indexes, hash joins, and sequential scans make your head spin, you might end up with seemingly simple queries running very slowly once you start to get more records in your database. Inefficient queries can end up supporting a ton of your app's functionality if you're not careful, leading to sluggish performance overall. At the least, check out the impact introducing a view has on the speed of the process that's using it.
What now?
If you're interested, here are some resources I'd recommend (and the links I've referenced throughout this post) to learn more! Live long, prosper, and fear not the database 🖖
Posted on November 22, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.