On estimating data science projects
Nicolai Thomsen
Posted on November 15, 2022
Estimates for data science projects can be tough. Here are a few tips to hopefully make your life easier.
Ah, the estimate! Easily given, mostly regretted.
A product owner corners you, in a meeting room or, worse, at the coffee machine. After a few pleasantries, the PO asks you for an estimate for a new venture: Project AI.
What do you answer? You haven't done something like Project AI before, and, frankly, you don’t really know what it would require. Yet, it's about AI and you are a data scientist, after all, so you should be able to pull a figure out of your sleeve.. Right?
1. "I will get back to you"
An instant estimate is a surefire way to regret it later - Even if you are an experienced developer. First, you fit to the available data. There's no shame in putting time and thought into the estimate first. Communicate that, arguably, it's a disservice to both you and the proverbial PO not to do so.
What do we want to know?
Particularly, you will need to understand the context of project AI first, so, before you leave the meeting room, ask all the questions you need. Be mindful of the phrase "a ballpark estimate" as well. Too often, ballpark estimates end up being taken as qualified, once they switch hands a couple of times.
You may want to be mindful of your own expertise too. It would feel wonderful to says "I could have it done by Tuesday!" and surely your PO would be impressed. Until Tuesday that is. Consider what learning curve you would face in Project AI, and let this influence your estimate. Don’t put future you in a difficult position by ignoring this factor.
2. How data ruins everything
Any software development venture has risks. Data scientific ones have those too, but with an added dimension. One that is often overlooked by PMs, POs and clients: Data signals.
Even for the most brilliant architecture, success hinges on generalizable signals in your data. Say Project AI is a dog breed image classifier (because, why not). We may hypothesize that we can differentiate breeds, but what happens to the project if there are no significant signals for your models to fit to? how long will it take you to procure and/or annotate new data? These are the additional risks that govern data science projects. And they must be accounted for in any reasonable estimate as well as communicated clearly to your stakeholders. What is your contingency plan? Try not to scare them too much.
3. Explore the jungle
"well..", you start and mumble something about uncertainty. Instead of treading water, diminish that uncertainty a bit by exploring the domain jungle. How? By asking questions. A lot of questions. We know by now that we don't make instant estimates, so the next step is about (A) learning what you can, and (B) delimiting your eventual estimate. "Can we realistically assume that we will have quality data handed?" and "what specifically would the end deliverable look like?" are good places to start. Mix in your expertise ("If that's so, we could do Y instead of X, to achieve Z"), when applicable.
As you traverse the jungle, one of two things are likely to happen: Either you will learn most of what you need to, or your PO will become keenly aware that you can't be expected to make a realistic guess based on such sparse information, much less a qualified estimate. Once you have squeezed the jungle fruit for information, return to civilization and crunch the numbers in peace.
4. Break it down
There are multiple ways to build an estimate, and people much cleverer than I have written extensively about them elsewhere. One example is the simple approach explained in the The Pragmatic Programmer by David Thomas and Andrew Hunt (2020, 2nd edition, pp. 65-71). My interpretation is as follows:
1. Sketch a solution architecture: This is the fun part. Visualize the elements involved (Backend, frontend, APIs, DB) and the data flows between them. I use MS PowerPoint for this, although some devs consider it a sin. Here's a simple outline.
2. Break each part down into components: Create a tabular overview of the components of each element. I prefer the following format.
Element | Component | Functionality | Conditions | Est. days |
---|---|---|---|---|
Comp. Aa | < Functions to cover > | e.g., 'Specified data schema respected' or 'Client documentation available' | 4 | |
Serv. A | Comp. Ab | 10 | ||
Comp. Ac | 7 |
3. Identify the risky components: Be mindful of which components are likely to deviate from your estimate. You will want to pay extra attention to these. After project kick-off you may want to start off with these elements, particularly if they are in the critical path of your timeline. (Critical path is another great tool that I'd like to cover in a later post).
Alternatively, PERT is a great option, particularly if you have sparse information to go on.
Lastly, consider using a use case model for crystal clarity of what the deliverables should be able to do. In the case of our brilliant dog breed classification service, it could look like the following. Consider the use case with the prefix "A user wants to".
Use case: Upload a photo and receive back a predicted dog breed and a confidence score.
Requirements: [Quick response]
Solution: A RESTful API exposing a multi-class classifier with high inference speed, trained on (image, label) pairs of dogs their respective breeds.
On error: Return instructions for use of the API.
5. Ask around
More important than sitting down with the estimate alone, is to ask the people around you. Perhaps someone in your team or the community has dealt with something akin or tangential to project AI already. Likewise, consider reaching out to others in your space with more experience. If nothing else, they can likely point you in the right direction. Importantly, do not ask the client. "What's a realistic estimate for the client?" is not a relevant question in your position (although it's highly relevant for your PO).
6. Are you often late?
I once worked with a PM who took any estimate we gave her and multiplied it by pi (3.14). Seems high? Her intuition was that she consistently saw underestimation from data scientists. You could call it human curve-fitting. The underlying point here is important: You may be consistently under- or overestimating your projects.
Look outside work. Are you a time optimist? Perhaps this means that you are inclined to underestimate your travel time. It may be worthwhile to consider your own biases and whether they translate into your estimates on data science projects.
7. Keep score
Project estimation is a skill. Hard-won at times. In order to improve, you may want to log your performance. Take note of your estimation accuracy across projects or even individual tasks.
When the estimates are off, ask yourself what caused it. Maybe your model was wrong. Maybe you estimated data gathering to be more time-consuming than it turned out to be. Although all projects vary, you are likely to uncover some patterns to learn from.
8. Communicate continuously
Your initial estimates will be skewed once the project actually starts. That is almost inevitable. What matters is that you communicate why and, more importantly, how to adjust to your stakeholders continuously throughout the project. Estimation doesn't stop when the project starts. It's something worthwhile to include in stand-ups and checkpoint meetings. To quote Gergely Orosz in his awesome post on estimation in software projects: "Good communication is more important than good estimation". Make sure to pipe up when things change!
9. Putting it all together
We've covered a few tips in this post. Here's a neat list of them, tagged with the corresponding sections.
- Avoid instant estimation. (1)
- Be mindful of the learning curve. (1)
- Account for the risk of data signals. (2)
- Listen and learn first. (1, 3)
- Break it down. (4)
- Ask around for pointers. (5)
- Consider your own biases. (6)
- Keep score of your estimation prowess. (7)
- Good communication is more important than good estimation. (8)
I hope these tips will help you with your future estimates. Good luck with your projects!
Posted on November 15, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.