Small Stories Reduce Variability in Velocity, Improve Predictability
If you estimate the size of your stories in relative story points, realize that the more large stories you have in your backlog, the less reliable will be your velocity (trend) for planning future work. The velocity trend values will be less accurate when there are bigger stories in your backlog, than when the stories are smaller. The less accurate velocity trend is not a good basis for future planning, i.e., as a planning velocity for the next release. The reason for this lies in the nature of relative story point estimation, and there is some statistics to back it up too. Let us delve into this.
By design, relative story point estimates are increasingly less accurate for larger estimates. The Fibonacci sequence (well, a modified version of it: 1,2,3,5,8, 13, 20, 40, 100) is used as the point scale to model this. The idea is that a story estimated to be size 3 is likely to be in the small range of 2 to 5 actual points (if we were to measure actual points – which we don’t, but that’s another story). Whereas, a story estimated to be size 13 could be anywhere from size 8 to 20 in “actual” size, and so on up the scale.
This model of estimation accuracy for story size is analogous to the ”Cone of Uncertainty” (CoU) concept for software development projects. The CoU describes the exponentially decreasing uncertainty in project estimates as the project progresses. The CoU shows that the uncertainty decreases as more details about the actual requirements, design and implementation become known. “Estimates created at Initial Concept time can be inaccurate by a factor of 4x on the high side or 4x on the low side (also expressed as 0.25x, which is just 1 divided by 4).” [Construx public web page]. As the project progresses, estimates become increasingly certain until, for example, the day before the actual project completion date, that date is nearly 100% known. A diagram of the Cone of Uncertainty is shown below (from construx.com).
In an analogous way, early on in a project, work is not broken down into detail – we have epics and features. As the project progresses, we refine the backlog by breaking the epics and features down into user stories (and estimating their size). This is depicted in the diagram below.
Getting back to relative story point estimates, think about the following. The larger the estimated story size, the more error there can be in that estimate compared to the “actual” size of the story, due to the Cone of Uncertainty effect, which is modeled by the Fibonacci (or other non-linear) point scale. For example, a story estimated to be a 13 could be just a bit bigger than another story which is truly an 8, or just a bit smaller than a size 20 story. An 8-er could be anywhere from a 5 to a 13, and so on. As you can see, this estimation error is bigger for larger stories. When you add the estimated story sizes to calculate the velocity, this error adds up. An achieved velocity of say, 30, with mostly small stories, is something different from a velocity of 30 achieved with larger stories. Which of these 30s would you have more confidence in using to forecast future velocity?
Adding the caveat, “on average”, would probably be the more accurate way to make some of the above statements. A story estimated to be size 13 could on average (meaning over many stories estimated to be 13) be anywhere from an actual size of 8 to 20. And what of this notion of “actual size”? We don’t measure that, and these are all relative sizes, right? But there must be some actual size once the story has been delivered. Whatever it is, whatever its units – Lines of Code, Function Points, some function of effort, complexity, and risk – it does exist, and every story once completed has an actual size. We don’t have to know that actual size or units to make this argument about the error in the estimates and its effect on velocity.
This can be explained in terms of statistics as well. For those not interested in the math, you can skip ahead two paragraphs. I believe the case has been made for smaller stories in your backlog reducing variability in your velocity.
If we think of the error in any particular story size estimate (versus the “actual” size) as a random variable (i.e., over all stories estimated to be a particular size, say a 5), then the variance on that error is bigger for larger sized stories, due to the CoU effect. When the story sizes are summed, the variance of the error in the resulting sum (the velocity) is the sum of the variances (assuming for simplicity that the story size estimate errors are independent and normally distributed, which they probably aren’t, but my hunch is it – the variance- only gets worse/ bigger for other distributions). This statistical property may be difficult to perceive on any one Scrum-team-sized project, but think about it across an entire large organization of many, many projects over time. The statistics play out over the large number of samples. The more large stories that are in the sprint backlogs across the organization, the more variance will be in the velocity and thus the less reliable will be release commitments based on those velocities.
This strongly suggests that you (and your organization) will be much better off in terms of forecasting your rate of progress (velocity) for planning purposes if you have mostly small stories in your backlog. Not only does the math tell us this, but it passes the common sense test too. If you have a mix of small to medium and large stories in your backlog, all it takes is one or two of larger stories planned into a sprint not completing to significantly reduce your velocity from what you planned. The next sprint you may spike up in velocity due to taking credit for the large “hangover” story(s). If this pattern repeats sprint over sprint, your velocity will vary widely and predictions based on it will not be very reliable. On the other hand, if you have mostly small stories in your sprints’ backlogs, then if the team occasionally does not finish a story this will not impact the velocity statistics significantly… the average velocity should reliable enough for forecasting.
Ideally then, we would break all stories down to about the same small size and forget about sizing them – just count them and use count for velocity. However, in practice I have found this is not practical – there are usually constraints on how / much we can break down stories. Furthermore and perhaps more importantly, in practice we find that the discussions around the story size estimation tend to be very valuable in terms of developing a clear, shared understanding of the stories.
The moral of the story is… break down those user stories, or more generally, break down your work into as small chunks as is practical. There are many benefits of working with small stories besides the reduced variability. These advantages include: increased focus, which helps prevent failure; earlier discovery / faster feedback; shorter lead time/ better throughput; and reduced testing overhead. Those are good candidates for the subject of future articles.
A corollary arising from this observation is that if you cannot break down all your stories to be relatively small, realize that velocity may not be a reliable planning parameter. You might want to also look at story cycle time (per size), for example. In addition, the average story size (velocity/num_stories) could be tracked and monitored. The idea would be that lower is better – velocity will be more reliable for planning when there is lower average story size in the historical backlog.
The post Small Stories Reduce Variability in Velocity, Improve Predictability appeared first on LeadingAgile.