Measuring performance in agile
James Wade
Posted on September 12, 2022
It seems to be a common theme that after a while of implementing agile, management wants to measure performance.
Should you use story points as a performance measurement?
Management thinker Peter Drucker is often quoted as saying that "you can't manage what you can't measure." Drucker means that you can't know whether or not you are successful unless success is defined and tracked.
Business people want estimates. They want to know how much it’s going to cost them to get a solution, and they want to know how likely it is to come in on time and on budget. And of course, quality is not negotiable.
Estimating work
In agile and scrum teams, we give stories points as a way to estimate the effort involved in a piece of work.
Unfortunately humans aren’t very good at accurately estimating how long something will take.
Story points do not equate to time exactly. They are based on effort, complexity and doubt.
Complexity is the "stuff we have to figure out." We know we can solve the problem, and we probably have a decent feel for how we'll approach it, but we still have to figure it out.
Effort is the sheer amount of stuff that needs done. For me, that example is configuring SharePoint lists, because I knew exactly how to solve everything, and I knew how many there were, but it still took time to run through them.
Doubt is about the stuff we don't actually know if it can be done. We may suspect we're on the wrong track, that the technology isn't up to it, or some other factor that would cause us to churn for a while before we figure out if we can actually do the work.
Arguably it’s all effort, and by removing complexity and doubt, you’re able to give a better estimate of the effort involved in a story.
As humans, we are good at comparing stuff. Pointing gives a sense of relative size, giving us a sense of the difference between a house and a skyscraper, we should be less concerned about 12 floors vs 14 floors.
Mike Cohn summarises story points best: “Story points are a unit of measure for expressing the overall size of a user story, feature or other piece of work.” Story points tell us how big a story is, relative to others, either in terms of size or complexity. Mike often refers to “dog points” when helping teams understand the concept of relative sizing.
A 2-point (small) dog would be a Chihuahua. A 13-point (big) dog would be a Great Dane. With those two guides in mind, it’s fairly easy to size the other dog breeds relative to a Chihuahua or Great Dane. A Beagle, which is about twice as big as a Chihuahua, might be a 5. A Labrador, which is bigger than a Beagle but smaller than a Great Dane, might be an 8.
So how do you get developers to get more done? How do you go faster?
Getting More Done
I often hear management people say things like:
- “We need to get the developers doing more story points” or
- “How can we get developers to do more work?” or
- “Can we use story points to measure an individual’s performance?”
I get it. We want to go faster. We want to understand how to get more work done.
You might think that's simple, just put more work in and more work will get done, right? Unfortunately, of course, it’s not that simple.
Nobody would expect to put more work on the existing team and expect things to get better. So how can we get the outcomes we want?
Let’s say we want a developer to do 13 points per day. Of course, this can be achieved when a story is a developer just writing code at 100% capacity. However, when a developer is always busy, all of the time, we find that the wait time becomes longer. The effect of this is what the Japanese call Mura or 'waste' in the manufacturing process.
To explain this simply, I will use the analogy of a paper shredder:
- They have a maximum limit. If you put too much in, they will jam up. If you constantly feed in the maximum or more, they will overheat or worse, break.
- They can continually sustain a reasonable number at a time, at a reasonable rate, no problem.
Ultimately, in both paper shredders and developers, you need to control the flow, you need to limit the work in progress.
As we know, it's not just about writing code, it has no business value until it's in production. If it gets stuck in the process, it won't get into production or the developer will end up waiting for the business operations to become ready.
Everyone needs idle time or slack time. If there's no slack time, work will get stuck in the process. Any more than 80% capacity will result in longer wait times before things get processed.
For example, let’s assume each developer can do 13 points in a two week sprint:
100% | 90% | 80% | 70% | 60% | 50% |
---|---|---|---|---|---|
13 | 11.7 | 10.4 | 9.1 | 7.8 | 6.5 |
If we imagine that a two-pizza sized team (let’s say 6 people) can deliver a feature per two week sprint, and each feature is about 20-40 points of effort, and each feature is broken down into stories of 3-8 points, then you would expect the team to deliver 10-20 stories of business value into production per sprint, providing you’ve sufficiently broken down the work.
Just measuring story points and expect things to get better isn’t going to be enough. Work needs to be broken down. When work is broken up into small bite sized chunks, it can get through the process quicker. For example, one 13 point story done, but not in production does not provide value to the customer compared to one 5 point story complete and in production delivering value.
If it’s not simply a case of putting more work in to get more work out, what is it?
Culture
It’s really easy to destroy the culture of an agile team with metrics. We need to be sure that what we measure encourages the right behaviour.
Using a team’s velocity as a performance measurement comes with a strong warning label:
“Scrum’s team-level velocity measure is not all that meaningful outside of the context of a particular team. Managers should never attempt to compare velocities of different teams or aggregate estimates across teams.
Unfortunately, we have seen team velocity used as a measure to compare productivity between teams, a task for which it is neither designed nor suited. Such an approach may lead teams to “game” the metric, and even to stop collaborating effectively with each other.
In any case, it doesn’t matter how many stories we complete if we don’t achieve the business outcomes we set out to achieve in the form of program-level target conditions”
We’ve all heard about working smarter, not harder, yet by focusing on story points as a measurement, we find that although in the short term we will succeed at getting people to complete more story points by simply working harder, this approach will not necessarily achieve the outcomes that we want.
People will always find a way to “work smarter”, however, we find that by focusing on story points it encourages the wrong behaviour over time.
“Estimate inflation is when the estimate assigned to a product backlog item (usually a user story) increases over time. For example, today the team estimates something will take five points, but previously they would have called it three points.” - Mike Cohn, Agile Alliance and Scrum Alliance founding member
We need to ensure that we encourage the correct behaviours, working smarter to achieve the outcomes that we need, rather than working smarter to game the system to achieve more story points.
“Putting too much emphasis and concern on velocity will often result in gaming the system through the use of velocity inflation” - Tom Smallwood, Agile Coach, Consultant at Smallwood Software Solutions, Inc
I’ve witnessed it first hand. If you aim to increase velocity the team will look to provide bigger estimates to make them look better or worse chasing points. It means you spend more time arguing over estimates, repointing, and administration to chase every single point. And worse, once these behaviours take their hold, it becomes difficult to get people to break out of it.
“Estimation is a waste. Estimation of tasks in hours is utter waste, people spend hours debating minutia, when they would better off just starting” - Certified Scrum Trainer Mark Levison
This doesn’t mean that we shouldn’t estimate at all. It means estimation, as estimating with story points is relative, it’s only really useful to the people who are making the estimates. If the team finds it useful to estimate, then you should do it. It should be used as a tool to help understand what is involved in a volume of work and how to make it doable.
Chasing points also discourages the right behaviour. Once people understand that their individual performance will be based on how many story points they complete, it simply becomes a case of just doing their work, then slinging it over the fence, to testing and/or ops to do their bit. You end up with silos and no collaboration.
It shouldn’t be about story points. Story points are simply a tool for a team to use to help them understand the effort involved in a piece of work. If we want to encourage the right behaviours, we have to ensure that whatever we measure will encourage the right behaviours.
I remember Dan North explaining a story where a company was measuring story points and had concluded that the solution was to fire the person who wasn’t completing any story points. Dan says that he said this would be a terrible idea as this particular developer was acting like a multiplier, helping the other developers in the team to go faster.
We want to encourage people to work together as a team. We want to encourage people to work smarter, not just game the system. We want people to be effective at delivering value, not just good at scoring points.
Measuring “work done” is misleading. How much work we’ve done pales in comparison to actually delivering the right things.
“Efficiency is doing things right; effectiveness is doing the right things” - Peter Drucker, founder of modern management
So how do you measure the team's effectiveness and nurture the right culture? Ask the team.
“Imagine that your new boss allows you to choose which metrics will be used to assess your individual performance in your role as a senior developer. Which metrics would you suggest?”
I asked this question to a team of engineers (developers and testers) and discovered that they already knew which metrics were the most important.
They voted the most important metrics as: the ability to deliver on schedule/within budget; customer satisfaction ratings for the product/service; and revenue performance of the product/service. A few people valued these metrics: Number of bugs found; peers’/teammates’ rating of my performance; performance of product/service against industry benchmarks.
These metrics only got one vote each: Frequency of major product revisions/releases; my own self-rating of my performance; number of hours worked/billed. However, nobody thought that these metrics were important: Frequency of check-ins/commits; manager’s rating of my performance; number of lines of code written.
From this, I discovered that it’s important that we understand what metrics we can use to measure effectiveness, but also ask the team what they think is important so we know where the knowledge gaps are, so that we are aligned and can focus on what we know matters.
Predictability
In an agile team, a key health metric is predictability, and by improving our ways of working we can become more predictable.
For example, If you’re being asked to “burn down” or “burn up” stories, then this signals that stories are taking too much time. They are spending too much time in progress because they are incorrectly estimated or simply too big. We should spend more time breaking down and/or unpacking the stories.
A manual/artificial burn-down is a bit of an anti-pattern. Stories should be small enough, made smaller or sub tasked. If we aim for consistency, then we should end up with stories of roughly the same size.
We should be aiming to improve the value curve by completing stories, rather than artificially burning down based on a guess.
Don't get bogged down in the implementation detail. Don't point to the best case scenario, be realistic, what would be worse case be and work from there.
Pointing should give you a sense of relative size, for example, we know the difference between a house and a skyscraper, we’re less concerned about 12 floors vs 14 floors.
Continuous Improvement
The agile manifesto explains “The team reflects on what happened in the iteration and identifies actions for improvement going forward”.
In Scrum, a retrospective touches on all three pillars of Scrum (Inspect, Adapt & Transparency) and is a key component that insights reflection and progressive improvement of processes and working agreements.
In Lean, it’s described as Continuous improvement, or Kaizen, a method for identifying opportunities for streamlining work and reducing waste.
At Spotify they introduced a “health check model” to help the teams self organise and identify areas to focus on, such as support, teamwork, control, mission, health of codebase, suitable process, delivering value, learning, speed of development, ease to release and fun.
All of these techniques and methodologies talk about the same thing, continuous improvement.
The Health and Safety Executive (HSE) and the Chartered Institute of Personnel and Development (CIPD) talk about a healthy team being empowered and able to review processes to see if work can be improved.
Retrospectives provide useful feedback and the health check model does give you a sense of whether things are getting better or worse. This puts people at the centre of what we do and puts them in control of their own improvements.
“Remember, outcomes are what matters - not the process, not controls, or, for that matter, what work you complete”, The Project Phoenix, 2013
Although some might say it’s about how much work you get done and others think it’s how well the process goes, we can all agree that it’s outcomes that matter. Not just building the right way, but building the right thing. We need to ask the right questions. What do your users think? What are you working on bringing value to the company?
“The key to creating a lean enterprise is to enable those doing the work to solve their customers’ problems in a way that is aligned with the strategy of the wider organisation” - Jez Humble, Vice President of Chef
If you really want to improve the performance of an agile team, then measuring story points isn’t going to encourage the correct behaviours. Instead by focusing on outcomes and asking the right questions:
- How do we know right now that what we're working on right now is the most important thing?
- What goals, objectives and measurements for this year? Who is accountable?
- Which measure does the business care most about?
- Which products/projects relate to which business goals and measurements?
- Who is accountable for each product?
Fortunately the agile methodology, SCRUM framework, Lean Enterprise and the DevOps movement all encourage the same behaviours, because they are founded on the same principles. Your application is only as good as your team. If your team follows these principles, then you will get quality.
Quality
Now we know that it’s not about story points or doing more work, but instead doing the right work by focusing on the right thing. By understanding that through continuous improvement and focusing on the outcomes we want, we can reach our goals. We can really begin to think about how we measure our success.
“The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously” - John Seddon
From my experience, most developers when we talk about quality go straight to talking about the quality of code or quality of the process, or such. However, my view is that you could have the best code in the world that does absolutely nothing. So what is quality?
Most of the time when we talk about quality we think about testing. Quality assurance shouldn’t just be about testing. It’s not just about making sure we are building the product correctly, it’s about making sure we are building the correct product.
We learn that testing is not supposed to be something that is just tacked on at the end, you have to build quality in from the beginning. We begin to think about the process, and the standards, knowledge and tools we use to improve the process from the beginning.
We think about how to measure the success of our processes. How fast our builds run, how long our tests take, how many bugs we have, how many fatal errors there are in production, what our availability is, our time to recovery, how performant our application platform is, how much test coverage our code. However, this is how you measure the quality of a process.
It’s not about how many bugs we’ve found, how fast or often we deploy, how long work takes, the quality of our code. Sure all of these things help improve processes, however these only measure the quality of the process, they don’t measure the quality of the application.
How do you measure the quality of a product? Quality is hard to objectify, it’s not a specific thing, you can’t point at it and say that’s quality, or can you?
I’ve always had a fascination with process, ever since I learned about the success of Henry Ford and the production line. It’s interesting then, that when we talk about quality, we look to Japan, we talk about Kaizen, the Japanese word for “continual improvement”, and in software development, we talk about “The Toyota Way”. Are Ford and Toyota quality brands?
When I think of quality, I think of contemporary brands like Apple or popular car brands like BMW. They build quality products.
“Apple’s market share is bigger than BMW’s or Mercedes’ or Porsche’s in the automotive market. What’s wrong with being BMW or Mercedes?” - Steve Jobs, 2004.
We don’t talk about quality in terms of processes, it’s the end result that’s important. Quality then, is something you measure against another thing of a similar kind. It is knowing what the standard is and making it better.
Remember, outcomes are what matters - not the process, not controls, or, for that matter, what work you complete.
Commitment
Unlike Kanban, Scrum-style “sprints” promotes “commitment-driven planning” rather than “velocity-driven planning”.
The agile methodology says nothing about velocity or commitment, the SCRUM framework however is quite prescriptive about it.
Scrum demands a full-team commitment: If you’re behind, I’ll help, and I know you’ll do the same for me. It’s not “these are my tasks” and “those are yours.” - Mike Cohn
Using a commitment-driven method means that the team is able to function as a team, be autonomous and continuously improve.
Velocity is variable, it does go up and down, so it only really becomes useful in the long term as a rough guideline. It may eventually become useful for a long term team on a long running product, but this is rarely the case.
Just like trying to be accurate with estimates, you can fall into the trap of becoming too scientific about velocity - you don’t need to be.
“When a measure becomes a target, it ceases to be a good measure.” - Goodhart's Law
The problem I find with velocity-driven planning is that it makes it about story points again, rather than a commitment to the outcomes we want.
Although the agile methodology doesn’t say anything about velocity or commitment, it does say “Projects are built around motivated individuals, who should be trusted”.
We really need to trust our teams of motivated individuals to commit to building the product, rather than prescribing a velocity.
I can understand that trust might be an issue for some teams, so I would suggest that you build trust by using measurements that matter to help to demonstrate the team’s performance.
Progress
In the agile methodology it says “working software is the primary measure of progress”.
It’s interesting because it puts a big emphasis on working software being a priority but progress being the goal. I therefore think it’s really worth thinking about “progress” being a measurement and how you can best demonstrate that.
“Release early, release often and listen to your customer” - The Cathedral and the Bazaar, Eric Raymond
In a traditional “waterfall” approach, it would be quite difficult to demonstrate or even measure progress, however in this scenario, you would be able to get feedback from the customer as soon as the software is working.
In order to truly honour the principle set out in the agile methodology, to show progress you need to release early and release often so that you can demonstrate working software to the customer to get their feedback.
It must be true that, in order to demonstrate working software, it must be of a quality where it does not fail. We must have some measures in place to ensure it is working. We need to know that our code is of good quality, there are no bugs that prevent it from working, the pipeline does not fail to build. Fixing issues then becomes a top priority.
We also need to measure progress, so how long does a unit of work take to go from conception to inception where the user can use it - this is known as “lead time”. But what about how long it takes a developer? That’s cycle time. Build time is another good one. Anything we can measure to demonstrate that we’re able to progress quicker through work is a good metric.
Measuring Success
We need a way to know if things are getting better or worse, we know that working software is the primary measure of progress.
We need to ensure that we understand the business metrics used to measure the success or failure rate of our platform.
Lead Time
Lead time tells us if we have an efficient process
In software development, one metric that matters is lead time. If our lead time is good, then we know we have an efficient process. If anything affects our lead time, then we need to address those things at the root, but we can't improve what we can't measure. If development is taking too long, this will highlight that.
Working Software
“Working software is the primary measure of progress” - Agile Manifesto, 2001
Have we built the right thing, the right way? We need to know what our customers think.
You could have the best continuous integration pipeline, the best code and 100% test coverage, but if the software doesn’t do what customers want then it’s not of quality. Product quality is relative to the customer's expectations.
By having strong customer feedback loops from the beginning, we're able to understand the business impact sooner.
Team Health
Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
Team health checks help us to identify areas to improve on and give us a good idea of if things are getting better or worse. It means we have to work together to be successful.
I found that lots of people have experimented with ways of measuring and visualising how their teams were doing, often by showing progression through various levels called a “maturity model”.
“A maturity model is a tool that helps people assess the current effectiveness of a person or group and supports figuring out what capabilities they need to acquire next in order to improve their performance” - Martin Fowler
Maturity models are great, but they can easily be misused and sound a bit patronising. Instead I wanted something that would help the team understand where they felt they were and what areas they needed to improve on.
At Spotify they introduced a “health check model” to help the teams self organise and identify areas to focus on, such as support, teamwork, control, mission, health of codebase, suitable process, delivering value, learning, speed of development, ease to release and fun.
I found that retrospectives provide useful feedback and the health check model does give you a sense of whether things are getting better or worse, but it doesn’t really help with line management.
One to one meetings are proven to be a great way to maintain good, open communication, and continue to build a relationship. It’s important to meet on a regular basis to maintain alignment. Setting objectives helps to bring focus and alignment.
The stronger alignment there is, the more autonomy will be granted.
The biggest challenge with bringing alignment is that it requires a clear mission, objectives, strategy and tactics that everyone can agree to and buy in to. Without this, you find that the left hand is digging a tunnel while the right is building a bridge.
However, one to one's and objectives setting is not indicative of the culture of a team.
I was asked what role I will play and how would I measure my success. So how would I measure success?
As we know, success should be defined and tracked. We need to have a sense of if things are working (or not).
So let’s look at that: “Engagement through staff feedback” Are we engaged in what we do? Are we measuring the right things? Well. What do our staff say? We need to ask them…
What I need is a way to know whether our culture is getting better or worse. Culture is intangible and hard to change, but it can be measured.
A model created by sociologist Ron Westrum suggests that the biggest predictor of job satisfaction is how effectively organisations process information. You can start to measure this at a team level by asking the following questions to the team members on a quarterly basis:
Rate how strongly you agree (7) or disagree (1) to the following statements:
- On my team, information is actively sought.
- On my team, failures are learning opportunities, and messengers of them are not punished.
- On my team, responsibilities are shared.
- On my team, cross-functional collaboration is encouraged and rewarded.
- On my team, failure causes enquiry.
- On my team, new ideas are welcomed.
Alternatively the Gallup Q12 is said to be a good way to measure engagement.
However, it doesn’t need to be complicated, a simple dot-voted “pulse” on a scale of 1 to 10 taken at retrospective each sprint is quick and easy, and is a simple way to get started to give you a gauge of how a team is doing.
Conclusion
One of the biggest traps I see teams going through an "agile transformation" do is obsess over story points, burn down charts and velocity and I think this is a mistake.
Transforming a team from a traditional waterfall approach to an agile approach takes more than just understanding velocity, it requires a complete change in culture, one of continuous improvement and psychological safety.
Remember what it is you're trying to achieve and measure what really matters, focus on the outcomes rather than story points.
Posted on September 12, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.