Anyone who builds software for a while can estimate how long it will take to do something. *hard*..Basically when the work itself is about, it’s hard to come up with a fair estimate of how long something will take *solve* something. One of the pet theories I’ve had for a really long time is that some of this is really just a statistical artifact.

Developers think it’s really decent to estimate the * median * time to complete a task. Planning is difficult because they smoke on * average *.

— Erik Bernhardsson (@fulhack) May 11, 2017

Let’s say you estimate that the project will take a week. Let’s say you have three equally likely results: it takes 1/2 week, 1 week, or 2 weeks.The· *Median* The result is actually the same as the estimate: 1 week, but *average* (alias *average*,alias *Expected value*) Is 7/6 = 1.17 weeks. The estimates are actually calibrated (unbiased) to the median (1), but not to the mean.

A rational model of the “explosion factor” (actual time divided by estimated time) looks like this: Lognormal distribution.. If the estimate is one week, model the actual result as a random variable distributed according to a lognormal distribution for about one week. It has the property that the median distribution is exactly one week, but the average is much larger.

The logarithm of the blow-up factor gives a simple old normal distribution centered on 0. This assumes that the median blow-up factor is 1x, and as you can remember, log (1) = 0. However, the tasks may be different. There are various uncertainties around 0. You can model this by changing the σ parameter, which corresponds to the standard deviation of the normal distribution.

Applying some numbers to this, if log (actual / Estimated) = 1, the blow-up factor is exp (1) = e = 2.72. Since the project completes with exp (-2) = 0.14, or 14% of the estimated time, it is possible that the project will explode with a factor of exp (2) = 7.4 as well. Intuitively, the average is so large because there is no way for a task that completes faster than an estimate to make up for a task that takes much longer than an estimate. We are limited to 0, but not in any other direction.

Is this just a model? You bet! However, I will soon reach the actual data and show that this actually maps to reality fairly well using some empirical data.

## Software quote

Good so far, but let’s get a feel for what this means from a software quote perspective. Looking at the roadmap, let’s say you have 20 different software projects and you’re trying to estimate. How long does it take to complete? *All of them*..

Here the average is important. Means are added, but the median is not. So if you want to know how long it will take to complete the sum of n projects, you need to look at the average. Let’s say you have three different projects in your pipeline with exactly the same σ = 1.

Median | average | 99% | |
---|---|---|---|

Task A | 1.00 | 1.65 | 10.24 |

Task B | 1.00 | 1.65 | 10.24 |

Task C | 1.00 | 1.65 | 10.24 |

sum | 3.98 | 4.95 | 18.85 |

Note that the average is 4.95 = 1.65 * 3 in total, but not in the other columns.

Now let’s sum up three projects with different sigma.

Median | average | 99% | |
---|---|---|---|

Task A (σ = 0.5) | 1.00 | 1.13 | 3.20 |

Task B (σ = 1) | 1.00 | 1.65 | 10.24 |

Task C (σ = 2) | 1.00 | 7.39 | 104.87 |

sum | 4.00 | 10.18 | 107.99 |

The means are still totaled, but far from the naive three-week quote you might come up with. Note that a project with high uncertainty of σ = 2 is basically final. *dominate* Average time to complete. For the 99% percentile, it not only dominates it, but basically absorbs everything else. You can run a larger example.

Median | average | 99% | |
---|---|---|---|

Task A (σ = 0.5) | 1.00 | 1.13 | 3.20 |

Task B (σ = 0.5) | 1.00 | 1.13 | 3.20 |

Task C (σ = 0.5) | 1.00 | 1.13 | 3.20 |

Task D (σ = 1) | 1.00 | 1.65 | 10.24 |

Task E (σ = 1) | 1.00 | 1.65 | 10.24 |

Task F (σ = 1) | 1.00 | 1.65 | 10.24 |

Task G (σ = 2) | 1.00 | 7.39 | 104.87 |

sum | 9.74 | 15.71 | 112.65 |

Again, at least 99% of the time, one rogue task basically dominates the calculation. That said, even though all of these tasks have the same median time to complete, one freak project will take up about half the time spent on these tasks. For simplicity, we have assumed that all tasks have the same estimated size, but different uncertainties. The same calculation applies even if you change the size.

What’s interesting is that I had this gut sensation for a while. The sum of the estimates doesn’t work very well with fewer tasks. Instead, figure out which task has the highest uncertainty. These tasks basically dominate the average time to complete.

There are two ways to estimate the size of your project.

(A) Break things down into subprojects, estimate and sum them up

(B) Estimating bowel sensations based on how nervous I feel about unexpected risks

So far (b) is much more accurate for projects over a few weeks— Erik Bernhardsson (@fulhack) March 8, 2019

The graph summarizes the mean and the 99th percentile as a function of uncertainty (σ).

This has math now! I started to understand this during project planning. I think summing up task estimates is a really misleading picture of how long something will take.

## Where can I find empirical data?

I submitted this to my brain for a long time under the “Curious Toy Model” and sometimes thought it was a stunning illustration of the real-world phenomena I observed. But one day, while surfing the net, I came across the following interesting dataset: Project estimate and actual time.. Great!

Let’s create a simple scatter plot of the estimated time to completion and the actual time.

The median explosion factor is: *exactly* This dataset is 1x, but the average blowup factor is 1.81x.Again, this supports the premonition that developers estimate. *Median* Well, but the average will be much higher.

Let’s look at the distribution of blow-up factors. Let’s look at the logarithm.

You can see that the blow-up factor is exp (0) = 1, which is much better around 0.

Let’s take a closer look at statistics. If it’s not tea, you can skip it. What can we infer from this empirical distribution? You might expect the logarithm of the blow-up factor to be distributed according to a normal distribution, but that is not entirely true. Note that σ itself is random and varies from project to project.

One convenient way to model σ is to sample σ. Inverse gamma distribution.. Assuming that the blow-up factor logs are distributed according to a normal distribution (as before), the “global” distribution of the blow-up factor logs is: Student’s t distribution..

Let’s adapt the Student’s t distribution to the above distribution.

In my opinion, a decent fit! The t distribution parameter also defines the inverse gamma distribution of the σ value.

Note that values like σ> 4 are very unlikely, but when they happen, they cause thousands of average explosions.

## Why software tasks always take longer than you think

Assuming this dataset is representative of software development (suspicious!), We can infer some more numbers. With the t distribution parameters, you can calculate the average time it takes to complete a task without knowing what the task’s σ is.

The median blow-up factor estimated from this fit is 1x (as before), but the 99% percentile blow-up factor is 32x, but the 99.99% percentile gives a whopping 55. *a million*!! One (wavy hand) interpretation is that some tasks are essentially infeasible. In fact, these extreme edge cases *average*, Its average explosion coefficient *Any task* Eventually *infinite*.. This is pretty bad news for those who miss the deadline.

## Overview

If my model is correct (big) *if*Then, what we can learn is:

- People estimate
*Median*Completion time is good, but not average. - You can see that the mean is significantly worse than the median because the distribution is distorted (lognormal).
- Things get even worse when you add up the estimates for n tasks.
- Often, the task with the highest uncertainty (largest size) can dominate the average time it takes to complete all tasks.
- The average time to complete a task we know nothing is actually
*infinite*..

## Note

- This is clearly based on one dataset I found online. Other datasets may give different results.
- Of course, my model, like any other statistical model, is very subjective.
- I would like to apply the model to a much larger dataset to see how well the model works.
- I have assumed that all tasks are independent. In reality, they may have correlations that make the analysis much more cumbersome, but in the end (I think) we come to a similar conclusion.
- The sum of the values in the lognormal distribution is not the value in another lognormal distribution. This is the weakness of the distribution.The distribution can claim that most tasks are really just the sum of subtasks. stable so.
- The small task distorted the analysis and had a strange spike at exactly 7, so I removed the small task (estimated time less than 7 hours) from the histogram.
- The· The code is on my Github, As usual.
- There are some discussions In hacker news And With Reddit..

** Tagged: Software, Statistics, Popularity
**