Transitioning your career into Data Engineering
ramsjha
Posted on February 21, 2022
As a: professional with substantial experience/a college graduate/anybody, how do I start a career in Data Engineering?
I get this question all the time.
Data Engineering is a complex field with a diverse landscape of technology, and it can be difficult to decide what the right starting point is. Will it stay relevant with time?
In this post, I will try to help those who want to get into Data Engineering as I once had to answer those questions myself, and now I have some ideas to share.
There are a few things that are preventing the new specialists from entering the field. First of all, let’s start with a common belief that after transitioning, all skills you have acquired so far are going to be thrown away. Not really! While it is going to be an incremental skill-building exercise, the existing skills will be of use. And a newbie will get a clean slate with software engineering awareness.
Let’s take a Mainframe professional, for example. The key skills these specialists possess are Data Storage and Retrieval, Processing of Data, Optimisation, and Visualisation. What will happen to these? Data storage and retrieval philosophy remain the same but the mechanism will change. Text format becomes the Avro/parquet file format, we use Snappy compression instead of EBCDIC, file pointer vs block read, etc. Processing of data is another key element in both professions; a page or row of file/DB can be morphed into a block process using map-reduce. Database Processing and System Design skills will also be built upon what you already have.
As you can see the key elements of playing with data remain the same, it is just the philosophy that changes with every new tooling or methodology that comes into the market. Just like with people in Peoplesoft, QA or ETL background – embrace the change and be resilient to adapt to it.
With what we know, let’s have a look at what skills are crucial to be a professional Data Engineer. First of all, you need to know any programming language like Python, Scala or Java. A low level, however, will require some work. Database skills are important, you should be familiar with Relational, MPP, NoSQL, Cloud DWH, and SQL. You should also know what to do with processing frameworks (Map Reduce, Spark, ETL framework etc.) The last skill you are going to need is Data-intensive design. With this at the core, we have to complement our skillset with Cloud, DevOps, Scheduling, Documentation & Infra as code, Security etc. Above all, problem-solving skills lie at the heart of it as the tech landscape changes but the solution approach and intuition will always remain the same.
Where to start is a tricky question. I recommend figuring out the gap in your knowledge and starting to bridge them by learning those concepts. A good idea is practising them by making Git Repo for showcasing. Download open datasets, ingest using real-time or batch and apply transformation using structured streaming, spark SQL or CDC. Then push to BI tooling, automate the workflow and in the end make it complex using assumptions. Once you start getting comfortable with individual projects, you may try going back to the concepts and upgrading them to the next level, and focusing on design patterns to go deeper where experience can be leveraged.
The journey might be daunting, and you will probably end up feeling lost or getting stuck at some point, but this is what transition always looks like. Once in the role of a Data Engineer, you will learn how to adapt to the constantly changing reality, and shaping your skillset accordingly won’t be as much of a challenge for you anymore.
Posted on February 21, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024
November 30, 2024