My Brush with Analytics Engineering

butchland

Butch Landingin

Posted on May 14, 2022

My Brush with Analytics Engineering

TLDR; dbt-cli + VS Code + VS Code dbt-related extensions + gitpod = best online dbt development workflow!

gitpod vscode editor with dbt

Introduction

To give some background, I have been a longtime programmer, and have worked on all kinds of enterprise software systems for various verticals (aka Java enterprise systems). I have worked with all kinds of RDBMSes over the years (Oracle, MS SQL Server, Postgres, etc.) and while I am not a DBA, I considered myself somewhat proficient with SQL (including knowing how to use the native query optimizers and optimizing ORM queries).

Boy oh boy, did I have a lot to learn when confronted with the Modern Data Stack (MDS)! Instead of an ad hoc, almost mystical process of building queries to visualize the data, now we had a clear template to create a pipeline of SQL data transformations from standardization, deduplication, materialization and joins and other steps that makes it easy to create dashboards and other visualization frontends. I also learned about CTEs (Common Table Expressions) that make SQL more maintainable and window functions that make it more powerful. I loved that part of the MDS development workflow.

10 Things I hate about...

However, there were also some things I hated. One was having to use a web console to run queries (despite some cool stuff like query syntax validation and schema aware autocompletion). The other was not being able to use version control. The SQL queries were stored in an opaque storage location that was accessible only through the web console. Other stuff I didn't like was having to manually run queries to materialize views after updating the SQL and having to check the downstream views to see if they broke.

Towards a better workflow

I know now that a lot of these issues could have been dealt with by dbt, except for ONE: having to use a web console to build and run queries - even dbt cloud provides a web IDE which is a web console app to build and maintain SQL views and tables as well as run queries.

Of course I also know about the dbt cli - I love dbt cli (aka dbt-core)!!! All I needed was a good editor - and right now, VS Code is the editor-du-jour for folks like me. So I was so happy to find some VS Code extensions that made the dbt development workflow a joy for me -- specifically the vscode-dbt-power-user extension which provides lots of cool stuff too long to mention, the vscode-dbt extension which provides really useful dbt and sql snippets, and sqlfluff -- the sql formatter and linter.

Previewing SQL Results

However, there was still one piece missing that would allow me to be weaned from any sql web console -- the ability to preview sql results. In the dbt cloud web IDE, this is known as preview sql button. Whenever I used VS Code with the dbt cli, I still had to login to my cloud DWH web console and run the compiled queries in order to preview the results -- not a good developer experience IMHO.

VS Code, dbt, and BigQuery

So I decided to built it myself and here it is: the vscode-dbt-bigquery-power-user -- I know that's a mouthful, haha. It's a fork of the vscode-dbt-power-user extension that adds a preview of the sql results (as well as download the results in csv, text and json formats)
query runner preview

But there's a catch -- right now it only supports Big Query. But with some elbow grease, I plan to support all the other major cloud DWHs as well as Postgres (and maybe SQL Server?) databases. I also plan to enhance my vscode extension to add the other stuff like analyzing the dbt manifest json file and extracting the schema from the underlying tables. Plus of course incorporating any other cool stuff the original vscode-dbt-power-user will be adding (my extension is a fork of the vscode-dbt-power-user query).

What's the experience like? It's not perfect -- it doesn't provide the same level of smart autocompletion that native DWH sql cloud consoles have -- but most of the time I just do a star query to get all the column names and copy the results over to a temp file, so its not a big deal. But at least for me, its a 100 times better since I can now do with SQL all the other stuff I do with other programming language files like JavaScript and Python -- version control, diffs, linting, autoformatting etc. So, in my so humble, but of course BIASED opinion -- it's a total game changer!

VS Code, dbt and Big Query Online

But wait, there's more! I got interested in what would make dbt more effective in improving the lives of analytics engineers -- and learned about the Analytics Engineers Club which was founded by Claire Carroll (formerly of dbt labs) and and provided training to grow data analysts into effective analytics engineers. And I agree with their viewpoint that at some point, analytics engineers need to learn to use the dbt cli and be comfortable with using the terminal as well as the other software engineering practices.

But as a teaching company, they weren't really happy with having to deal with installation and configuration issues that their students had to undergo every time they enrolled in a course and sought to look for alternatives. As of late, their solution has been to provide to use a desktop VS Code editor connecting to pre-provisioned EC2 instance.

But I think I've found a something BETTER! Gitpod is the solution! It provides an VS Code editor online experience so close to my desktop experience that I pretty much would use it when I can't use my desktop VS Code editor -- and with very little friction compared to a desktop experience. No need to pre-provision instances or even install VS Code on your desktop.

So I built a project that would allow you to create a pretty good dbt work environment with a VS Code editor online -- the only thing you need is a dbt project hosted on github. If you want to try it out click here for more (note: of course it installs and leverages my vscode-dbt-bigquery-power-user extension)!

If you find the extension useful, don't forget to rate the extension or star the github repo. And if you have suggestions or experience problems with the extension or the setup, please file a issue in the project's issue tracker!

💖 💪 🙅 🚩
butchland
Butch Landingin

Posted on May 14, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related