About this post

All right, fellows, I'm quite new to the data world, my entire career was development itself, but aren't we always trying to get better? That's why I'm taking my chances now, and my first challenge in my adventure is to decide between Postgres and Clickhouse (I'll explain at the problem section). Anyway, you can use this article as a source of knowledge but I beg you to don't skip the comments section because the main point of this writing is to help me discover all concept mistakes I have in my mind! So read this, but go check what te good coders have to say about it :).

The problem

Now, the most important part of every task, understanding the problem.

Although I need to store data for transactions and analysis, two different things, I still have all my application info stored in one Postgres DB. I guess for an app with a small amount of data that would be nothing to worry about, I mean, Postgres was not made for OLAP but it can handle the job. But in my case scenario, I have a Bigdata challenge, we're saving a considerable quantity of historical data along with daily transactional info and this is inflicting our performance.

Another worthy thing to remember about my problem: The end-user can create his own queries, so it's a little bit difficult to predict how our DB it's gonna be used.

The solution

Here is where I expose myself, I thought about some workarounds and the point is for you guys read and tell me when and if I got it wrong, if you agree and got extra tips or own experience stories, please tell me all about! Appreciate!

With Postgres

I need to, at least, try to maintain Postgres in my system, after all, it seems easier when you compare to migrate your entire application to Clickhouse. Let's skip obvious things, such as updating hardware, isolating DB from application etc.

The only way I figured out to continue with Postgres is by changing our system architecture. Since is quite impossible to predict all types of queries our users will create, indexing and SQL structures will not solve 100% of the problem. That's why I'm thinking about an approach where we store historical and transactional data into two different environments, this way when our end-user create his own query it wouldn't need to scan an entire big table, only a small piece of it and perhaps if he needs to access the oldest data we can redirect him to the other slower environment.
But this will apply more complexity to our application and it goes against a hybrid OLAP/OLTP architecture, that I believe is a good bet to the future!
Any tips here?

Clickhouse

If you don't know much about Clickhouse, please check this link, in short ClickHouse is a columnar DBMS for OLAP, it is great to run queries on a big amount of data, here you can see a benchmark comparison between Postgres and Clickhouse.
Clickhouse itself would easily solve our performance problem, bigdata is definitely not a problem for this guy, but still, not 100% good.

"Specialized tools are specialized: just remember the limitations! - Bad with heterogeneous hardware (Cloudflare experience) - Non-throttled recovery (source replicas flooded with replication load) - No real delete/update support, and no transactions - No secondary keys - Own protocol (no MySQL protocol support) - Limited SQL support, and the joins implementation is different. If you are migrating from MySQL or Spark, you will probably have to re-write all queries with joins" - hnmullany from Hackernews

All right, we kinda have a solution here, Clickhouse is for sure a good choice, but will not solve everything, that's the point when I started to think about combining the pros of Postgres and Clickhouse. Maybe a Foreign Data Wrapper could save me here. This way I can use Clickhouse to handle all massive data queries, even those that my end-user will create himself and for operations like delete and update I can continue to use Postgres features.

Using clickhousedb_fdw may be a good choice for that, but... Is not perfect as well, I don't have much experience using an FDW but I guess it would apply complexity to our application, mainly when we're talking about synching our foreign tables.

That's what I have so far, what you guys think about?

Thanks for stopping by! Good Luck and Good Code.

Blog

Clickhouse over Postgresql?

Nicolas Lima