Boosting Career in Data Engineering: Insights and Strategies
Leo Chashnikov
Posted on November 13, 2023
My name is Leonid and I have been working as a developer for over 10 years, currently at Meta (ex-Facebook). The term "Data Engineering" best describes the scope of my responsibilities at the moment.
In this article, I’ll share tips on building a career as a Data Engineer and delve into the dos and don'ts.
**
How is a Data Engineer (DE) different from a Software Engineer-a (SWE) and a Data Scientist-a (DS)?
**
The typical role of a Data Engineer involves constructing a data processing pipeline and optimizing it. This encompasses designing an effective data storage schema, ensuring seamless data updates. In certain companies, Data Engineers also develop systems and frameworks to facilitate Data Scientists in deploying their models and experimenting with data more effortlessly.
Distinguishing itself from Software Engineering, Data Engineering hinges on a profound grasp of distributed systems, data formats, and data processing procedures. It's common to find individuals transitioning from the Software Engineering realm to Data Engineering, as the languages may be the same but with different frameworks.
On the other hand, Data Scientists possess a profound understanding of the domain and business, crucial for effective data processing and extracting insights. While they may have programming skills, it's not the primary focus. Data Scientists often use "notebooks" and scripts for testing theories, sometimes falling short of engineering standards and optimal code.
In a collaborative setup, Data Engineers work closely with Data Scientists in the same team. Their responsibilities encompass deploying and potentially rewriting code, ensuring stability and speed, and addressing all potential corner cases.
**
Who and why might find the transitioning to Data Engineering appealing?
**
Delving into Data Engineering can be particularly thrilling for developers with a keen eye for detail and a passion for optimization. It offers an opportunity to explore the intricacies of distributed systems and efficient data processing, allowing Software Engineers to enhance their skills with SQL and NoSQL databases.
Additionally, for those eager to collaborate closely with Data Scientists, transitioning to Data Engineering opens doors to jointly refine and optimize analytical models. It requires the ability to tackle tasks related to code efficiency, stability, and accounting for all conceivable corner cases.
What adds an intriguing dimension is the connection with the business aspects of a company. Data Engineers must comprehend business needs and domain areas to ensure effective data processing aligns with the company's strategy, delivering tangible value. This holistic approach makes Data Engineering appealing to those seeking to blend technical prowess with a strategic inclination to influence business processes through data processing optimization.
**
How to transition from a Software Engineer to a Data Engineer?
**
If you're making the shift, your familiarity with tools commonly used by Data Engineers is a valuable asset. Having experience with diverse databases proves beneficial, particularly in discerning the most suitable database for specific scenarios. For instance, PostgreSQL might excel for feature-rich searches, while ElasticSearch could be more efficient for text searches. A solid grasp of SQL is essential, as SQL-like queries are standard across various databases and data processing systems like Spark or query systems such as Presto.
Embracing standard engineering practices is fundamental; this includes writing tests—ranging from unit tests to integration tests—adopting CI/CD practices, and implementing infrastructure-as-code. This distinction sets Data Engineers apart from Data Scientists, who often focus on one-time code creation with less emphasis on long-term support.
In the realm of data tools, pinpointing a single dominant stack is challenging, but a good starting point may include:
Apache Spark: A highly popular platform for processing data in both batch and streaming modes.
Apache Kafka: A distributed event streaming platform that also allows on-the-fly data processing, serving as an alternative to Spark in a streaming system.
Presto or Snowflake: Query and access systems for data. Snowflake stores data itself, while Presto facilitates connections to different databases and the combination of their data.
Apache Airflow: A widely used platform for managing dependencies between diverse data sources and pipelines for processing them.
**
What’s expected at higher levels?
**
1. Understanding of Business and Domain Needs
Communicating effectively with the primary clients, typically data scientists or data analysts, requires a grasp of the business aspect of your company. Understanding the company's operations and revenue sources enhances the ability to discern critical data and its intended use.
2. Greater Planning Horizon
Elevating to a senior level involves not just coding proficiency (expected at the middle level) but the capability to break down large projects, envision the broader picture, and navigate tradeoffs. Senior roles often necessitate acting as an "arbitrator" in resolving technical disputes within the team.
3. Risk Management
As projects grow in size, unforeseen issues and on-the-fly changes to technical tasks become more prevalent. Effective risk management becomes crucial. This might involve rapid Proof of Concepts (PoCs) to test the viability of ideas and adapt the original design as needed.
4. Mentoring and Knowledge Sharing
Despite individual proficiency, the ability to accomplish the work of 10 people solo is limited. Senior developers should evolve into mentors, sharing knowledge and assigning tasks to individuals who can derive the most benefit. An indispensable senior who hoards knowledge can hinder the team's growth, depriving less-experienced members of engaging tasks.
**
How to advance further in your career?
**
1. Define Your Growth Path
Identify where you want to grow and why. Some may prefer the path of an Individual Contributor, deeply engaged in coding and project planning. Others may lean towards managerial roles, focusing on removing obstacles for developers and fostering collaboration with other teams. It's essential to understand your preference and motivation.
2. Find a mentor
Find someone a few steps ahead on the path you've chosen. Having a mentor, especially within the same company, is valuable for insights into the promotion process, understanding what is highly valued, and presenting yourself effectively.
3. Explore Beyond Your Work Environment
Working in a stable company often limits exposure to a specific technical stack. Seek opportunities outside of work to try new things—whether through open source projects or personal endeavors. Experimenting with new technologies, even in small projects, allows you to familiarize yourself with emerging trends.
4. Don’t grab shiny new things
While exploring new technologies is encouraged, resist the temptation of adopting every shiny new tool. Avoid "resume-driven development" where you incorporate new libraries or rewrite projects solely for the sake of using the latest technology. Moderation, such as adopting "one new tool per project," is prudent advice.
5. Remember that “skills that got you here won’t get you there”
Recognize that moving to a new career level involves not just deepening technical knowledge but actively applying soft skills. Senior roles require understanding not just how to write code but also knowing what to write and why. Effective communication with people becomes more crucial than interactions with machines at this stage.
**
What to read for deeper knowledge?
**
1. Designing Data-Intensive Applications, by Martin Kleppmann
A true classic that delves into the operation of various distributed systems, consensus protocols, SQL and NoSQL databases, and message brokers. While you may not need this depth of knowledge in your daily work, the book provides valuable insights into the stack upon which your systems are built.
2. High Performance Spark / Spark: The Definitive Guide
High Performance Spark — ideal if you're familiar with the basics and seek optimization tips.
Spark: The Definitive Guide — is suitable for those new to the system.
It's advisable to read both books cursorily first, keeping them as references to revisit when needed.
3. Fundamentals of Data Engineering
Geared more towards beginners, this book seems tailored for Data Scientists aiming to grasp the essence of Data Engineering or planning a smooth transition into the role.
4. Software Engineering at Google: Lessons Learned from Programming Over Time
An excellent resource for imbibing best practices in Software Engineering. While not all solutions may directly apply to "Move Fast" startup environments, experienced engineers share valuable insights into designing systems for large-scale and long-term usage.
5. Algorithms and Data Structures for Massive Datasets
Explores probabilistic data structures (e.g., bloom filters, HLL), sampling techniques, and structures optimized for data stored in "external storage." Essential reading for those keen on optimizing the processing of large datasets.
**
Conclusion
**
In this article, I have tried to share my expertise in the field of Data Engineering, pointing out the key aspects of career development. I hope that the tips for moving into Data Engineering and using the necessary tools will be useful for those who are looking to reach their potential in this exciting field.
Posted on November 13, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.