How to Be DevOps Expert Roadmap 2022

In the last few weeks, I met some folks in my mentoring sessions, who are new to DevOps or in the mid of their career, who were interested in knowing what to learn in 2022. DevOps skills are high in demand and there is constant learning required to keep yourself in sync with market demand.
This post is to share the notes that can help you. Let’s see some guidance based on my experience and understanding.

DevOps Roadmap

Be fundamentally strong in the networking technologies

Understand the concepts such as HTTP/2, QUIC or HTTP3, Layer 4 and Layer 7 protocols, mTLS, Proxies, DNS, BGP, how load balancing works, IP Tables, the working of Internet, IP addresses and schemes, and lastly the Network design. I found Julia Evans’s blog very useful and my go-to place when I need to understand stuff in a simple way. She has covered a wide variety of topics in her blog posts and zines.

Master the operating system fundamentals, particularly Linux

As most of the systems (VMs, Containers, etc) run Linux, it is important to know from top to bottom. Learn scheduling, systemd interface, init system, cgroups and namespaces, performance tuning, and mastering the command line utilities — awk, sed, jq, yq, curl, ssh, openssl etc., Learn performance troubleshooting from Brendan’s blog.

CI/CD

If you are still into Jenkins, it is fine. But, the world has moved to cloud-native pipelines. Conceptually not much has changed in this space, but you can look into Github Actions, Tekton etc. How to do releases better? Understand various deployment strategies such as blue-green and canary.

Containerisation and Virtualisation

Apart from the popular Docker runtime, try containerd, podman etc and knowing How to containerise applications, how to implement container security, how to run and orchestrate VMs in Kubernetes, see KubeVirt project.

Container Orchestration

Kubernetes is now a de facto standard for running containers. There is a lot of content on the Internet to learn Kubernetes. Focus on configuration best practices, application design, security and scheduling. Setting up a cluster is getting trivial now but the day 2 operational stuff such as setting up, monitoring, logging, CI/CD, how to scale the cluster, cost optimization and security are some questions people might be expecting from you.

Observability at Scale

Most of the engineers are aware of the Prometheus Grafana stack or similar. Trend suggest that many organisations are consolidating their Kubernetes clusters and observability, both from the performance and cost perspective, this helps. Learn the advanced configuration and architectures of Prometheus, and how to scale them. Look into technologies like Thanos, Cortex, VictoriaMetrics, Datadog, and Loki. Continuous profiling tools such as Parca, periscope, hypertrace and distributed tracing with open telemetry. Service meshes such as Istio are a popular ingredients in cloud-native recipes.

Platform team as a Product team

The function of Platform team is becoming more like a centralised product team that are focussing on their internal platform customers such as Developers and testers. The goal is to improve the ways of working and bring some order to the teams. Try to improvise on the problems the Developer and QA team faces. You are the enabler for other teams, instead of taking all the work in a central team, coach the dev team to take up typical DevOps responsibilities. That way you can scale and don’t burn yourself too much.

Security

In many small organisations, security was a second class citizen. Product features were given more priority. But, due to growing sophisticated attacks and various strict compliance requirements, companies are adapting to shift-left security strategies. End-to-end encryption, strong RBAC, IAM policies, governance and auditing, implementation of benchmarks such as NIST, CIS, ISO27001 are common. Container security, Policy as code, Cloud Governance and Supply chain security are hot topics.

which Programming language is need for DevOps

DevOps or SRE role is now taking the cross-cutting concerns of the Developers and creating tooling that can help in improving their productivity while enforcing the standards. A good quality software engineering practice and skill are required to craft the high-quality platform components.

I can’t give enough stress to this. The good organizations are looking for good programming experience in Platform engineers. It is important in site reliability engineering as well, where you need to be fluent in programming, able to read, understand and debug the code written by others and if necessary, fix it.

Python and Golang are the most popular ones. My suggestion is Golang due to features like strong concurrency, strict type checking, adoption in various orgs, tool chain and as many major projects are built using Golang, it makes sense to learn that over Python.

A few simple things you can try:

Write a CLI in your programming language.
Learn to write a REST API and interact with databases
Parallelism and Concurrency

Infrastructure as Code

Terraform is a standard in the projects. Once you understand the concept, it is easy to adapt to any other tooling as most of them are based on DSL.

Cloud

Most of the cloud works in the same way. So if you know one cloud well, you can easily work with other cloud providers. Focus on how you can design applications using cloud-native components in a highly available, resilient, secured, and cost-effective way.

Technical Writing

You might be wondering why I am talking about technical writing when discussing DevOps. A lot of folks don’t give enough attention to this, but it is super important in how you communicate and work with other teams. The future of work is remote and emails, slack/teams, chats are the primary channels to talk and convey ideas to others.

On a regular basis, you might be creating documents such as runbooks, postmortems, RFCs, architectural decision records and software design docs to name a few. A clear, easy to understand document does wonders. It can help you save your and the reader’s time and improve overall productivity.
Suggest you read this article.

Site Reliability Engineering

The boundary between DevOps and SRE is getting thin. In some organisations, the same person might be performing both roles. Understand the concepts behind SLI, SLO, and Error budgets and SRE practices. Each organisation does it differently, so I don’t recommend copy-paste someone else’s culture into your team. Refer to the Google SRE culture.

Conclusion

Personally, I am excited about following in this year. This is not a definitive list as it keeps changing with time.

Service Mesh — Istio, Cilium Sidecarless mesh, Tetrate and Solo’s Gloo mesh offering.
How to improve Developer Productivity? It is a mix of culture, automation and tools.
SRE Platforms — honeycomb, Last9.
DevPortals — again linked with the motive of improving productivity and bridging knowledge gap.
Observability — technologies such as open telemetry, hypertrace, Thanos, VictoriaMetrics, Vector.
Security — supply chain security, code signing, tightening cloud security.
Golang — improving the current skills.
Serverless computing and Event-driven architectures
Web3 — understanding the landscape related to DevOps and Infrastructure.

Be curious and keep learning. Continuous bite-size learning is easy, which you can do along with your full-time job. If you still have any questions, feel free to book some time with me at braincuber.com I am more than happy to help.