Firestore is Stunted, So... What is the Perfect Database in 2022?
Jonathan Gamble
Posted on March 11, 2022
Firestore
To be fair I meant only Firestore really, not Firebase the Platform. Ok, Firestore is not dead. It is quite popular. But it should be dead. It should be dead for the reasons I listed in my post almost a year ago. The team has spent the last 14 months building every single SDK, plugin, and monitoring addon for Firebase possible. This includes the controversial Firebase 9 interface (although I do like the word "faster").
But you know what hasn't been updated... not even a little?
FIRESTORE!!!!
- We still can't do any kind of basic aggregation without writing an error prone firebase function that doesn't take into account our current data.
- We still can't do a basic full-text search. Google... 🤨
- MOST IMPORTANTLY, we still can't count our search results. Sure, we can create automated collection counts by incrementing a random table value (yuck!). Try indexing every possible where clause in just one collection.
I have created a backend solution and a frontend solution for most use cases. I spent hours writing a backend package, but these days I prefer the frontend solutions.
Why I love Firestore
Firestore is so freakin easy to develop. I love that it is cloud hosted. I don't love the vendor lock in. I love that it has the easiest to use client side api for javascript development. I love that I can secure these things with Firestore Rules. I love how easy I can just add the subscribe function to get real time data. I love how Firestore can auto-scale for a few, or many users. I don't love having to manage a second database, and the lack of growth.
The Perfect Database
I want the perfect database. It has these features:
- Scalable
- Relational (SQL or Graph)
- Realtime Data Support
- A JS/TS Client Side API (or GraphQL)
- Middleware Security
- Cloud Hosted DbaaS Option
- No Vendor Lock-In
- ACL with Login Methods
I have been researching the perfect database, and it does not quite exist... yet. However, there are some options that come close, and some future prospects to watch out for.
noSQL
noSQL databases in general are great at scaling. You know what they're not good at? Everything else... Firestore recommends you use Algolia with it in order to add search functionality. Besides the obvious problem with a Google product requiring you do use a non-google product to function correctly, you would be storing your data in one noSQL database and have to manage a second noSQL database just for searching. Huh? This makes no sense, but it is the world we live in.
MongoDB simulates joins by aggregating data. However, you can't actually do joins. Firestore just says, hey, write you're own damn code that is probably going to be problematic to handle aggregations. Oh, and, there are limits on that too! So, I can't just duplicate 1,000,000 user profile's on posts, as the function would timeout. Yeah, you got to think outside of the box to hack a way of doing this with even more joins and complications. Oh, did I mention the follower feed problem? If you've been following my posts for a while, you know my many theories about how this CAN'T really be done outside of theory... at least not scalable in both users and followers collections. I have 10 posts on the subject. Here is the latest. Now, try aggregating data, keeping it up-to-date, and adding functionality later to existing data that grows with Firestore or Mongodb. Try this with plain-ol-redis. 😂
All data is relational. You simply cannot create a realistic scalable database solution with ONLY a noSQL database... unless...
a graph database is built on top of that noSQL database.
noSQL databases need to die as main databases. MongoDB, I do give you mucho credit for trying... key word "try" here.
Hello my lovely and remarkable... Graph Database 🥰... with one problem...
There is a huge misconception about Graph Databases. I have literally talked with a PhD computer scientist who works at Twitter who didn't understand why anyone would use a Graph Database who is not querying analytical data.
I got news for you buddy. All data is relational! Graphs relate data better than SQL. Graph Databases are better than SQL.
It used to be true that Graph Databases were not scalable. That is no longer true. It used to be true that Graph Databases were slower than SQL. That is also no longer true.
Every database can be awesome, it just depends on what is underneath. A great database is scalable by distributing the data, uses sharding, maybe multi-tenancy, and could even be deployed to grow... all automatically. We also have all needed features without having to spin up a second database. This is usually the problem.
Graph Databases are evolving. There are really two classic types: triple stores, and property graphs. Triples stores usually store RDFs --- that subject verb predicate thing used by SPARQL, and property graphs have edges and nodes. Despite what some articles say, they both can have index free adjacency. This is the real key to the future of a graph database.
Memory Location
Imagine a foreign key in a many-to-many SQL table. We often have to have a junction table (ms pivot table) when doing joins. We join one table to another, just to get the value of the real table we want. We need two joins to relate only two pieces of data. This is the real reason Graph Databases are amazing. We store the direct place in memory to goto, and we go there. We don't make a stop in between. Relational data is actually relational in a Graph Database.
n + 1
There is also the n+1 problem in SQL. If I query junction tables, I am querying n + one more than I need. SELECT
and WHERE
statements also require you to query more than you need. It is basically over and under-fetching.
Granted, both of these problems can theoretically be fixed if you're really good at SQL querying, and you use advanced JSON querying techniques. However, most cases are simply faster in a graph database. They were made for this kind of SH*T. Try querying multiple deeply nested data objects. If you don't know what a nested data object is, it is because you don't know how to think in graph terms.
Real problem with graph databases...
The real problem with a graph database is not that it is only made for graph data. ALL DATA IS GRAPH DATA!!!
The real problem is the same problem with Svelte... maturity. Most Graph Databases do not have all the necessary features to produce a web app like constraints, policies, triggers, full text search, and security. In fact, that list is very very short.
The definition of a Graph Database is also evolving. First you had graph databases, then you had graph databases built on noSQL or other KV stores, then you had graph databases built on sql (agensgraph), and now you have hybrids that use more than one database under the hood. The focus is no longer on edges and nodes, as a node itself can be used as an edge. The focus is on the speed you save by storing the memory location directly, and not the location of another intermediate table. The focus is on querying.
So which databases are in the lineup, huh?
Get on with it!
In order to compete with Firestore, you have to have pre-written and customizable middleware. I am only going to list databases that have this. In Firebase, this would be the Client Side Javascript API plus Firestore Rules for Security.
1. Supabase
Supabase is by far Firestore's number one competitor. Sure, it is also Firebase's competitor because it has storage bucks and login methods, but it is really Firestore's competitor because it has a client side securable interface similar to Firebase. It is so easy and lovely to use. You may enjoy the flexibility of a schemaless database, but the tradeoffs for relations are incredible. It uses PostgreSQL under the hood. PostgreSQL is faster for handling large sets of data, while mySQL is faster for smaller sets of data. It now even supports secure subscriptions. You must get used to using Policies and Constraints, but frankly Supabase makes writing these a pleasure, seriously. They are also working on a GraphQL Layer, which theoretically would mostly automatically handle the n+1 problem. There is one tiny caveat. PostgreSQL, while made for large datasets, is not made for scalable data. Sure, you can scale vertically with more computing power, but you can't scale (easily) horizontally with more computers / virtual computers. They may one day support this, but it won't be easy. mySQL can.
2. Fauna
Fauna is pretty freaking cool. I admit I have not yet had the pleasure to build anything with it. It is one of those freaky hybrids. You can store data in a key value store, but query it like a graph. The FQL Client API looks like Firebase 9, in that you need to import a lot of functions within functions. You use internal database techniques, like in Supabase, for security. The biggest two problems I see with Fauna are 1) Vendor lock-in 2) Learning Curve---it does not seem as easy to create links etc. as a graph database or sql database.
3. Hasura
Hasura gives you several choices of SQL databases to build on, but specializes in postgres. It also has the most advanced GraphQL engine that exists, although it is still missing some required features. You need to combine Hasura with Firebase Auth, auth0, or some other login system, but technically the middleware is there. It suffers the same scalable problems and feature problems as DGraph. You can also use NHost.io to automatically set up an instance of your database with a built in login system and file storage. I have not built anything complex with Hasura yet, but I have read about missing features like nested updates. I think once you get to the complex level, the GraphQL alone won't cut it. Honestly, no GraphQL cuts it... yet.
Honorable Mention
4. Dgraph
Dgraph was chewed up, spit out, killed, brought back to life, and now split. A month ago I would not have listed Dgraph at all, even though I love Dgraph. They basically got some VC money, spent the money, fired half the staff, started producing a decent return, and split. One side got the Founder and programmers and forked the open source part (now Outcaste), the other side got the board and VC money, as well as the paying cloud users. They are honestly going to be very different products in a year. I personally am spending a lot of time with both CEOs to give them my ideas, and the compiled feedback I have seen from the community of the users. There is an unofficial discord recently started with over 300 users. You will find both communities, and managing staff active on there. I do not care to take sides, I just believe in the original product's potential. There has been active talk of a second fork as well. Dgraph specializes in GraphQL written in GO for extreme speed, and is arguably better and worse than Hasura at GraphQL. I would say Hasura, Prisma, and Dgraph all are in a fight for the best GraphQL. I wrote the j-dgraph package just so it works like Firebase, querying the GraphQL automatically through JS methods. Dgraph checks ALL my boxes, and I believe in one year that one (or even both) versions will take the #1 and maybe #2 spots on this list. This product is absolutely amazing in every way, so follow me for updates.
5. neo4j
neo4j is that 'most popular' graph database everyone knows. They focus too much on the analytical users, and are missing out on the Firebase users. They have advanced querying capabilities with cypher, math functions, and triggers. While they do have basic constraints, they do not have policies. However, you could write your own with triggers. neo4j is really a beast and competes more with sql databases than you know. However they're missing out. They offer a cloud platform, but expect you to host your own GraphQL. They could make this process easy. They also haven't developed subscriptions, although people keep asking for it. It supports huge amounts of data, but it does not support sharding like DGraph. I have heard DGraph users switched from neo4j due to the inability to support the large datasets. So the enterprise version can scale for high availability, so it sort of scales. Full disclosure: I have not tested any of this, nor am I an expert by any means.
6. Planetscale with Prisma
These are really two different products. Really you could choose any cloud hosted mySQL database and Plugin Prisma to it, but the Fireship guy tweeted about Planetscale (and they spend a lot of money on Google ads), so I suspect they're legit. I need to spend more time researching this. Technically there is some setup needed for Prisma. Prisma itself is in the top tier of GraphQL, has its own api too like Firebase, but no frontend caching like pure GraphQL with URQL or Apollo. Prisma has subscription capabilities. This may should be your best option... TBD.
Up and Coming
7. EdgeDB
Edge database looks pretty freakin awesome. It basically seems to re-write SQL and Graph databases together to create some new-ish programming language. It takes care of all the problems GraphQL has, and seems to be built separately but on top of postgres. It is really something unique, beautiful, and powerful. They don't have a security layer yet or a cloud hosting environment, but both are in the works. However, postgres still suffers from the scalable problems we all know. If you like unique fetching and strong typing, also check out TypeDB. It doesn't make its own list number because there is no cloud version, middleware, etc. However, worth checking out.
8. 8base
8base has perhaps the most beautiful UI. It is mySQL, so it is serverless and scalable. It uses GraphQL, so has middleware. It checks all boxes except features. The GraphQL is adequate. It is not the best, not the worst. It needs nested filters, nested updates, etc. I am going to create an app soon with this thing so I can truly test its functionalities. There is also something called Grafbase for producing GraphQL apis, but I'm not sure where it stands... yet.
Who Wins?
Nobody. All options are missing something or another. I am currently building in Supabase because it is so damn easy, and they sent me a T-Shirt due to my past article☝. I love DGraph and will continue to give updates on that and Outserv (Outcaste's product). I think mySQL is the best overall database. noSQL scales well, but is not relational. Hybrid databases are built on other databases, or given names like newSQL. Ultimately great databases are hybrids, we just want all the management done automatically. 8base and Edgedb should definitely be on your radar. Also, MongoDB Aura, if you don't need a follower feed, doesn't have a front end client system.
If we stick with these rules:
- real time data
- client side api
- cloud hosting
- middleware
We only have: Fauna, Hasura, and Supabase...
Ok technically DGraph too. 8base is just a baby.
Complain though I have, the Firebase Team forgot about Firestore. We should not only be angry, but frankly disrespected. They quit listening to their users. I have high respect to all the team members, especially the active ones in the community, but low respect for whoever is choosing to keep Firestore stagnant. If they start development again, I will gladly continue spreading the good word. Hopefully, they create a cloud platform for an actual relational database instead. Either way, the product used to be great, but it is getting passed by. Firebase, do something about it!
One thing to remember
Separate your code. Build your React or Svelte app so that your Firestore or Supabase code is totally separate from your key elements. Use good DRY, SOLID, and KISS techniques. When we find that perfect database, your app will be built, and it will be easy to change your code. Otherwise, find the tradeoffs that work for you. Maybe you love noSQL databases like Cassandra. Maybe you want that Web3 database that syncs with the blockchain... did I mention that is Outserv, Dgraph's fork?! Maybe you don't mind managing a second database just for searching. That is fine too.
I personally am ready for the future. This year is a new beginning.
Check out my databases from last year.
Until then, keep building.
J
Posted on March 11, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.