Dgraph is the most exciting tech product I've ever used ❤️🔥. Here's Why:
Ben Woodward
Posted on October 27, 2021
This might sound like hyperbole but I genuinely believe that Dgraph is the most exciting dev product I've seen in my career. I'm constantly blown away by the seemingly small but crazy powerful features that I keep discovering, but the most exciting (meta)feature of Dgraph I want to talk about is that it has completely revolutionised the whole software development cycle. Especially for startups who are trying to ship fast. In this post I want to focus on two things; query optimisation and schema design.
Byebye Query Optimisation Hell
For as long as I've been a web developer, one of the unavoidable bottlenecks of the app development lifecycle is performance optimisation—you build your MVP, you ship it, your users test it, and then..... it errors or worse, crashes. In my experience, database queries are usually at the heart of these problems.
I'm currently working on a product that serves interactive transcripts, originally we built a graphql API with Elixir with resolvers that queried to a Postgres database. Due to product requirements, Transcripts, Paragraphs, Sentences and Words required their own tables.
With Words belonging to Sentences, that belong to Paragraphs that belong to a Transcript. This worked fine for short transcripts, but as you've probably guessed led to a very predictable N+1 query problem, (because querying for a Transcript generated a query for each Paragraph which generated a query for each Sentence). After we uploaded a 1hr transcript, any requests for that transcript's page crashed the server due to memory overload.
The best technical solution would have been to spend some time optimising the queries, but the fastest solution (which was the priority) was just to paginate the queries for batches of paragraphs. However, this 'solution' still involved having to write a crapload of boilerplate.
I have one failed startup under my belt that in retrospect failed because I was trapped in this boilerplate hell and ran out of runway. I didn't want to repeat that mistake so I started looking for solutions and happened upon Dgraph.
Initially I was sceptical because my assumption was that graph databases come with slew of technical overhead (needing to learn complex query language, needing to get my head around abstract graph database concepts). However, to my surprise and delight, it's actually possible to use Dgraph as a graphql API generator that happens to be backed by a graph database. By that I mean you can use Dgraph as a graphql API without needing to worry yourself about any of the details of how it stores and persists the data.
The reason is because graphql is Dgraph's the native query language, which means getting started with Dgraph is as simple as:
- Download and run the standalone Docker image to run Dgraph on your machine
- Create a very familiar looking
schema.graphql
file with your graphql types listed - Add the schema to your Dgraph instance via a simple CLI command
And voilá you have an API that you can start querying from your JS app. Oh, and by the way you no longer have to worry about N+1 queries and it can handle terabytes of data and billions of data points. What. The. Actual .. 🤯 TERABYTES?? BILLIONS??¹
Dgraph basically gives you Google level performance with next to zero-configs beyond your graphql types file. Maybe you're sceptical and wondering how a company you haven't heard of is able to achieve Google level performance? Well, here's why—Dgraph was literally built by one of Google's senior Search Infrature engineers to "scale to serve Google’s Knowledge Graph via Search".
This is next-level game-changing tech and there's just no way I am going back to relational databases now. The future has arrived!
The End of Your Schema Design Woes
Necessary disclaimer: This isn't really a slant against Rails in particular, because this issue exists in all MVC web frameworks that are backed by relational databases.
Another unavoidable technical challenge with shipping web apps is schema design. The most common approach to building web apps for a long time is to use an MVC framework and an ORM that generates your SQL queries for you. For most of my professional career I've worked as a Ruby on Rails dev, and more often than not, the area of the codebase where the most dev hours have been burnt through is on the ActiveRecord ORM layer between the database and the app. It works fantastically well when you have a simple schema like a blog and you need to query a blog post with its comments for example, but as soon as you have a moderately sophisticated hierarchy in your schema, it becomes less obvious whether the ActiveRecord is a benefit or a cost. I've seen so many Rails projects where developers have given up trying to optimise the multi-table ORM queries and have opted to just figure out the right SQL statements to use and paste those in. But either way, by this point a lot of dev hours have been sunken into something that really doesn't have much to do with your business logic, i.e. the thing your users are paying you for. For a startup, wasted time means shortened runway.
Most experienced web developers have learnt the hard way that it pays to be careful think ahead when you're designing your schema. As a startup, this part of the development cycle feels like wading through waist-high molasses.
With Dgraph, this very real 'cost' in the development cycle has seemingly been magic'd away🪄!
To give an example, for a while we were experimenting with the idea of providing a flashcards feature in our app. The feature would allow users to save flashcards into decks, and then nest decks into other decks.
What I wanted users to be able to do is pick any deck at any level in their tree of nested flashcard decks and review cards—and for the user to be shown cards not just from the selected deck but to recurse through and gather cards from all decks nested under the currently selected deck.
We quickly realised that if the decks were nested too deeply, this would result in a dangerously expensive query that would likely crash the server. So this is just part of the solution we came up with:
def deck_tree_depth_stats(parent_id, deck_id) do
Deck
|> recursive_ctes(true)
|> with_cte(
"roots", as: fragment("(
SELECT id, parent_id, 1 AS depth
FROM decks
WHERE id = ?
UNION ALL
SELECT decks.id, decks.parent_id, roots.depth + 1 as depth
FROM decks
JOIN roots
ON roots.parent_id = decks.id
)", type(^parent_id, :id))
)
|> with_cte(
"branches", as: fragment("(
SELECT id, 1 AS depth
FROM decks
WHERE parent_id = ?
UNION ALL
SELECT decks.id, branches.depth + 1 as depth
FROM decks
JOIN branches
ON decks.parent_id = branches.id
)", type(^deck_id, :id))
)
|> join(:inner, [d], r in "roots")
|> join(:inner, [d, r], b in "branches")
|> select(
[d, r, b], %{
parent_depth: max(r.depth),
node_depth: max(r.depth) + 1,
tree_depth: max(r.depth) + 1 + max(b.depth)
}
)
|> Repo.one()
end
In retrospect this was a terrible use of our time. However, it's easy to end up writing complex code you wish you didn't have to write. The reason is because it's actually difficult to avoid painting yourself into a corner with schema design decisions. If you design your schema the wrong way, you can end up in a situation where you are forced into having to come up with complex solutions to the performance issues that arise from the structure of your schema.
When you are a small startup trying to ship fast, you don't have the luxury to spend 15hrs reading through dry SQL docs trying to figure out how to get a query like this working properly. It's a direct threat to the success of the startup because any time you aren't iterating on features that directly create value for your customers is crucially-precious-time wasted.
We've since scrapped this feature for a number of reasons, but looking back, it's excellent example of why I think that Dgraph is 🔥NOS🔥 for startups.
In Dgraph, implementing this feature is just this (graphql query):
query {
queryDeck(slug: "korean-first-500-words"){
flashcards {
content
}
decks {
flashcards {
content
}
decks {
flashcards {
content
}
}
}
}
}
That's it. 🤯
Hopefully the NOS for startups analogy is clear by now. With Dgraph, our iteration speed as a startup is through the roof. Not just that, the whole process has become fun again—we're able to focus on shipping features, and iterate quickly without needing to be in a constant state of stress out about whether we're gonna be painted into a corner by our schema.
We're no longer held hostage by our database, we're now free to design a schema that is easy to understand rather than the schema that is least likely to blow up our queries.
What makes the NOS analogy more apt is that Dgraph feels like such a powerful edge that we have over potential competitors that I almost don't want to tell people about it. It's a secret weapon.
Click here to learn more about Dgraph
¹. Dgraph achieves this through its distributed nature, so while your laptop may struggle with a multi-terabyte Dgraph graph this performance is available for a flat fee via Dgraph Cloud
Posted on October 27, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.