How do we estimate?
There have been some web posts and twitter comments lately that suggest some people have a very narrow view of what techniques constitute an estimate. I take a larger view, that any projection of human work into the future is necessarily an approximation, and therefore an estimate.
I often tell people that the abbreviation of “estimate” is “guess.” I do this to remind people that they’re just estimates, not data. When observations and estimates disagree, you’d be prudent to trust the observations. When you don’t yet have any confirming or disproving observations, you should think about how much trust you put into the estimate. And think about how much risk you have if the estimate does not predict reality.
This does not mean, however, that you have to estimate by guessing. There are lots of ways to make an estimate more trustworthy.
Using more people to independently estimate is one common technique and provides a reasonableness check on the result. Wideband delphi techniques further this by then re-estimating until the predictions converge (or stalemate). People have widely adapted James Grenning’s “planning poker” to perform this procedure. In theory, having multiple independent estimates misses fewer important points and gives us a more trustworthy result.
In practice, the various estimates are often less independent than we think. A group that works closely together can often guess what each other are thinking about the kind of work they commonly do. In addition, many times some of the participants telegraph their estimates before others have decided, soiling the independence. A further problem is that variations in skills and abilities give some people an advantage in estimating work aligned to their strengths, but the estimates of those more ignorant in the work are often given equal weight, skewing the results. This is especially true when estimating things that have been broken down to small amounts of work.
Estimating relative to other work is easier for people, and therefore more reliable than estimating in absolute terms. I can look at two similar rocks and guess which one is heavier, or if they’re about the same, without knowing what either one weighs. This is the genesis of “story points.” Once we’ve assigned a value to one piece of work, then we can estimate others as multiples or fractions of that reference. Using affinity grouping, we can gather together all the work items that seem about the same size.
Unfortunately, we often have a harder time seeing the size of development project work than we do of rocks. Using the rock metaphor, we might be trying to compare a chunk of talc with a piece of uranium ore. Apparent size is sometimes deceiving. People also have a tendency to hold onto absolute references. They want their story points to be comparable from team to team, or from year to year. They want to adjust their estimates after the fact so that items that took about the same amount of time are given similar values. “We estimated that as a 2 but it turned out to be a 5.” They try to fix the story points to an absolute time or work reference, and in the process they make them less trustworthy by damaging the reliance on relative estimation.
Estimating based on recent history is an excellent way to improve the reliability of estimates, especially for the short term. The XP practice of Yesterday’s Weather is one example of this. “If we completed 24 story points last iteration, we’ll probably complete about 24 story points this iteration.” Bob Payne and I took a look at some data we had from teams with whom we’d worked and found that we could generally do as well, or better, by just counting the stories instead of estimating them in points. In other words, saying “If we completed 8 stories last iteration so we’ll probably complete about 8 stories this iteration” had about the same predictive power as using story points, and was a lot quicker to calculate. This was true even when the story estimates varied by about an order of magnitude. Others, such as Vasco Duarte, have noticed the same phenomena. Taking the story points out of the equation seems to remove some of the noise in the data, and certainly removes some of the effort required. If you want to get better, use what I call the Abbreviated Fibonnaci Series which has the values of “1″ and “too big.” Split the stories considered too big. You’ll accrue benefits beyond better estimates.
If velocity gives us a frequency measurement in stories per iteration, then it’s inverse is cycle time. Cycle time is the time it takes to complete one story–equivalent to a wavelength measurement. Once a team has some track record, then you can generally expect the these numbers to settle down into something fairly predictable. Because these estimates are based on data, many people are tempted to treat them as data, themselves. Remember, though, the disclaimer of investment managers, “Past performance is no guarantee of future results.” Even if the team has a consistent track record, there may be a black swan or three right around the corner.
Of course, all things are not always equal. Organizations have a distressing tendency to change the makeup of teams, which changes the rate at which the team accomplishes work. The work itself may change, and so may the team’s skill at dealing with the work.
This is just three categories for improving the trustworthiness of estimation. There are many other techniques for estimating. Most have advantages, and all have disadvantages. Even with our best attempts at improving estimates, the true goal is accomplishing the work. Ultimately it’s better to apply energy to that goal rather than chasing after ever better estimation.