Deploy on Fridays

hammerdr

Derek Hammer

Posted on October 5, 2019

Deploy on Fridays

Developers should aim to deploy on Fridays. Not as some Platonic ideal that only big companies (or small companies) can reach, but as an active project that the team embraces until it is reality. This post aims to outline WHY and HOW, and at the end there will be a bit of a discussion about call outs vs shaming.

Deploying on Fridays is a touchstone practice for displaying confidence, flow, reactivity, and maturity of your team and processes. Not being able to deploy on Friday means a deficiency in one of these areas, and each of them are things that measure the effectiveness of teams no matter the size of the company that they belong to.

Confidence

Confidence to deploy is NOT confidence that your code has no bugs. It is confidence that the systems that transport new versions of the code into production are safe AND that if the code does have bugs the team has sufficient mechanisms to deal with that eventuality.

For deployment mechanisms, this should be a well oiled machine that you regularly run and maintain. For lots of reasons we won't get into here, a shaky deployment system is problematic for your team, but in this case it means that whenever you want to put code into production you are always questioning if the code or the deployment is the real problem. Instead, if the confidence that the deployment system is high, the culprit becomes pretty immediately clear.

Dealing with bugs in code is where things become nuanced here. Up until this point, I've stated things as a broad stroke that apply to all teams, and I do believe that. However, different teams are going to require different levels of response to bugs to maintain confidence. If you are a critical, 24/7 system that has millions of customers, the processes to deal with a bug are going to be different than the startup with no customers in production. That's okay. The key here is that you have two things:

  1. The ability to recognize when something is wrong. This might be a manual QA process (very costly), an automated verification suite, strong observability into system performance, automated bug reporting, etc. If you are looking for a place to start here, I would strongly encourage looking at a tool like Honeycomb.io that give you world-class observability regardless of your size.

  2. The ability to do something about it. This could be configuration to enable/disable the feature, this could be rollbacks (manual or automated), this could be dark releasing features in the first place, this could be having graduated rollouts of features that you can turn off, or it could be a policy of "live with it until Monday".

How you solve the two things is up to your team and situation and so confidence isn't about what tools you use or process you have to deal with things, it is about being confident that you can regularly put new features into production without being scared of the consequences of pushing that deployment button.

Flow

For these next three, all of them are super important but could be whole topics of research themselves. I will touch briefly on them, why I think they are important, and why deploying on Fridays is a positive indicator for them.

Flow is the cadence of production that individuals or teams output. A good flow would be a regular, sustainable delivery of features/bugfixes/upgrades/etc. at an even distribution. The reason that this is good is that when flow is disrupted, you will always incur startup cost. One way to think about it is this:

  1. Good flow is starting your computer in the morning and being able to write code continuously until the end of the day, doing shutdown procedures, and starting anew the next day.

  2. Bad flow would be the computer requiring a restart every 30 minutes, forcing you to restart your servers, text editors, load context into the computer, etc.

  3. Great flow would be #1 but without the shutdown/startup procedures.. which is why most people just suspend their laptops instead of shutting them down.

If you were to put a block on deployments on Friday, this is creating a period of time where productivity intentionally drops and flow is disrupted. It creates an unevenness in the output of the team, where you will start to see things like Thursday and Wednesday people are starting to try to stack changes. This lopsidedness creates further disruptions in the flow of the team, where maybe Thursdays and Mondays become heavier deploy days that the team is dealing with batched deployments because of the desire to get work done around the Friday flow disruption.

Friday deployments don't fix all flow problems (not by a long shot), but the absence of them can be a big disrupter to a team and, thus, are an indicator of how well the team is operating.

Reactivity

Sometimes you'll see technologists talk about "MTTF" and "MTTR" and "reactivity" is what they are talking about. MTTF stands for Mean Time to Failure. Specifically, this means once I discover a bug in production, what is the period of time I expect there to be until I discover the new bug. This has been a metric used to espouse the benefits of QA, testing, pre-prod environments, etc. MTTR stands for Mean Time to Recovery. That is once we discover a bug, how long should I expect until that bug is fixed. MTTR is a metric used to talk about systemic recovery mechanisms like rollbacks, roll forwards, and overall process efficiency.

There is a natural, but very wrong, correlation to draw between MTTF and # of Deployments. The thinking is that the more you deploy, the more change you introduce into the system, and thus the more issues that you'll have. The opposite turns out to be true. First, we must understand that reducing the # of Deployments doesn't actually introduce less change to the system. The units of change to software are commits, not deployments, and thus the number of changes in a system with 10 deployments of 1 commit versus a system with 1 deployment of 10 commits are the same. How we get to the reverse of the natural correlation is that by analyzing system dynamics, we've noticed that small commits done often result in less bugs and a better MTTR. There are lots of reasons for this, but one of the easiest to understand is that it's a lot easier to grok/validate that the deployment of a single CSS change than to do the same when its 3 CSS changes, a permissions change, a database migration, and a new API endpoint.

MTTF is still a good metric, but it should be used in conjunction with MTTR as a way to look at the health of the system. When you are more responsive to your system (as measured through MTTR), you systems get healthier.

Some people that advocate for not deploying on Fridays will talk about having bugs pop up less often, or the impact that it will have on their weekends. Both of these are sympathetic viewpoints, but the data shows that if you focus on MTTR instead of MTTF, you can get make Friday deploys a non event that will not have an impact on your weekends any more than a No Friday Deploys policy. This step seems big to some people, but is easier than it seems at first. Quick deployments and rollbacks fix most of MTTR and are fairly low cost.

Maturity

This is a loaded word, but this refers to team maturity in the "X Maturity Model" sense. Specifically, the maturity I'm looking for here is the team's recognition that software isn't done until it is in production and that developers should be shepherding the commits they make all the way to users (and, honestly, beyond, but that's a different topic). The internalizing that changes you make aren't thrown over a wall or left for someone else to deploy at some point in the future indicates a team that is moving in the right direction.

Ownership at the individual developer level all the way to production is the healthiest ownership model I've come across, and is only enabled by developers being able to push their code into production any time after the developer believes it is ready. Forcing changes to be queued breaks that culture. I've seen this first hand with very good teams / developers that once a "deploy train" was introduced, this sense of ownership degraded over time. Before we turned it around, the developers were asking to hire a specific person to do just deployments just to deal with the fallout of such trains.

Call outs versus shaming

Finally, the thing that prompted me to write this was a discussion on Twitter that's been happening over the course of several months now. There are a group of individuals that bask in the "virtue" of not deploying on Fridays. They spout this belief far and wide and seem to be a bit immune to changing their mind on the topic.

I think it is fair to call these individuals out. To describe their actions as unhealthy for the developer community and to advocate for people to abandon this approach to software development.

For other people, their reality is that they do not deploy on Friday. That doesn't mean they are bad people. It doesn't mean that we should shame them. For many people, they are doing what they think is best. That's okay, I'm happy to discuss with anyone approaches and paths forward to getting into a better place. Others know they are in a bad place and are actively working to get to a better place. Awesome! Heroes. All of you.

Conclusion

Deploy on Fridays.

💖 💪 🙅 🚩
hammerdr
Derek Hammer

Posted on October 5, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related