Jason Gorman uses a golf analogy to talk about estimation.
I like his analogy, but he didn’t take it far enough for me. He left out a key element: we may not be playing golf.
A typical sin committed by people who do studies of schedule slippages is to discuss average amounts of time to do X or Y while only considering cases where X or Y were successfully completed. What about the cancellations? Those are ignored. Being ignored, the resulting averages have questionable meaning, except to say “the people who took more than X time to do this task gave up, in our experience so far.”
Jason says that I probably won’t have to hit the golf ball 10,000 times to get it into a par 3 hole. Well, no, but if I carry it to the hole and place it in, how many hits is that? Zero? Or is it “ERROR UNDEFINED VALUE” because I cheated? This is relevant because in software development, I frequently discover that my plans won’t work as conceived, no matter how long I could work them. Or I discover a new way to do them that cuts down the time, or increases the time. Or new requirements are put on me from a technological or human source, and that messes things up. Or it turns out that the task is mooted by some deletion of requirements.
I remember the Borland C++ 4.0 project. It went through a careful planning process. We used Gantt charts. Gantt charts, I tell you! Then Microsoft shipped C++ 7.0 with a new feature, the application wizard. Our plans fell to dust. The project was restarted. A tiger team split away to create AppExpert. And those of us who were suspicious of grandiose planning for a fast-moving world got our I Told You So moment.
The nice thing about agile development is that breaking things down into smaller chunks and planning as you go makes one year slippages such as we experience in Borland C++ 4.0 far less likely. It makes the game easier to play; more stable.
So, I like Jason’s analogy. It’s a good teaching analogy because it illustrates that how you model a task dominates how you estimate it. But if I were teaching with it, I would ask the class “What assumptions am I making?” and I would get the class to make a list. Assumptions include that we know what game we are playing, we know the rules of the game, we will not be surprised by new rules, we have a clearly defined and obtainable goal, we don’t get sick or injured, etc., etc.
If I’m asked to make an estimate on which millions of dollars depend, then these are vitally important issues to raise publicly. If I’m just spit-balling an estimate in a low stress situation, then I will make a par-for-the-course guess and not worry when surprises happen.
Shrini Kulkarni says
James,
Do you remember our discussion at STAR East last year on estimation – Non linear nature of software development and testing tasks – power law, one way valve effect?
All software estimation model (attempts) will fail that do not take into account of nature of software, non linearity of software related tasks ( as against tasks like building/testing non software related things) and human content in software.
Most of the existing models have “linearity” and some notion of counting (test cases, use cases, function points, lines of code etc). For success in developing software estimation model, one has to leave these fundamental assumptions. And there is a “SMC – simple medium, complexâ€? model hoax. I have seen people applying this outrageously simplified model of classifying things – test cases, bugs, automation scripts and so on. Typically the question is “you have 1000 test cases to execute per cycle – how much time does it take?â€? Immediately, an estimation enthusiast says – “apply SMC model, categorize 1000 cases (some times even worse – use a sample of 30-50 cases of the whole lot) as simple, medium complex and complex test cases, estimate the time for each of the category. Do some math – you will get the estimation. The problem here is typically “complexâ€? means “it takes more time to do. My notion of complexity is degree of “unknown-nessâ€?. Higher the degree of unknown-ness, complex is the stuff.
If we stop counting things, factor in non-linearity, we might be closer to a reasonable software estimation model.
Again, we should stop adopting or barrowing from fields like manufacturing – if it takes 1 hrs to produce a bolt, it would take 100 hrs to produce 100 bolts. Unfortunately, very few managers understand, software is different from a manufacturing assembly line.
Shrini
Michael M. Butler says
Also, a lot of times the proper heuristic for figuring additional slips might be emotionally / politically unattractive.
Imagine that enough information existed (it might not, but imagine it does) to make the following predictions…
Suppose that data suggests that for the class of project under examination, if it slips one day the odds are 90% that it will slip a whole week, and if it slips a whole week, the odds are 90% that it will slip another three weeks (a month of total slip). Suppose that if it slips a month the odds are 90% that it will slip another three months. 0.9 ^ 3 ==> just under 73% odds that if the project slips a day it will slip four months,
Nobody likes those numbers. So I’d expect them to not be predicted.
Even with different starting odds / assumptions, nobody likes to say there’s any chance of a project slipping too much.
This seems to fit in with two other topics:
1) The “technical debt” issues mentioned elsewhere in this blog (since one way to ship sooner is to redefine completion by cutting features and / or incurring technical debt), and
2) The observation made long ago (by you, James, or was it Jerry Weinberg?) that the only participants in some open-ended testing exercises who claimed 100% of testing had been completed in an hour were the managers — by definition, if one hour of testing was what was scheduled, and one hour of testing was done, that constituted 100% ( 🙂 ).
Graham Shevlin says
Here is an interesting article about estimation approaches from Ian Australian consulting company…
Graham Shevlin
http://grahamshevlin.com
Michael Bolton says
The observation made long ago (by you, James, or was it Jerry Weinberg?) that the only participants in some open-ended testing exercises who claimed 100% of testing had been completed in an hour were the managers — by definition, if one hour of testing was what was scheduled, and one hour of testing was done, that constituted 100%.
We teach testers, not just managers, to think that way as a matter of course. If the mission is “find as many problems as you can in an hour (or a test cycle, or a test project)”, then if the tester has found as many problems as she could at the end of the alloted time, then she’s done 100%. The trouble is that managers frequently change the game after the fact when they ask accusingly “why didn’t you find that problem?” The answer, of course, is that we weren’t asked to find that problem. We might have found the problem had it not been hidden so expertly; we might have found it had we not been investing and reporting so many other problems; we might have found it had some of those problems not been in the code to start with. The point is not to blame in return, but to underscore the fact that we haven’t yet found a way to schedule discovery of something hitherto unknown.
Michael M. Butler says
Mr. Bolton:
Of course, what you say is true; but I wasn’t just talking about it as the gotcha game (re that, the Ozzie page mentioned by Graham Shevlin is quite salient).
One of the things about golf is that it’s generally clear to fairly-disposed participants when any particular player’s ball has gone into any particular cup. Neither the cup, nor the ball, nor the behavior of gravity, wind or turf change fundamentally during the course of n holes of golf. I think that’s much of James’s point. SW dev as golf is a species of “ludic fallacy”.
What I was trying to point to is the well-known problem that people speak and think loosely and move goalposts. It’s typical for someone to phrase or interpret things to confirm what they wish to believe. Thus, “Testing phase is 100% complete” == “100% of the alloted testing time and effort has elapsed” turns into a tacit “the product has been tested 100%”.
The last _feels_ so much better, but of course it’s existentially virtually impossible for any complex product.
Redefining the product in order to meet deadlines can also happen in several ways. I expect that every testing practitioner who has been around for any length of time will have participated in some bug triage meeting near the ship date where some pretty important things get deferred.
Sometimes these sorts of things constitute a kind of confirmation bias so strong as to deserve another name entirely. It’s not the blame I’m trying to highlight, it’s the sense of shock some folks seem to experience when there’s a mismatch between what they thought they knew and what turns out to be so.
Jonathan says
James,
Really interesting post (and Blog). I have to say I can definately relate to the “cancellation” point. I would love to hear your thoughts on a new company called uTest – they’re attempting to create a “community testing” model which relates loosely to the time management issues you were discussing. Thanks and I look forward to the feedback.
Best,
Jonathan
[James’ Reply: I don’t know much about it. I’ll ask you the same question I ask any test lab. How do you know your testers are good? How do you know they are good enough? Don’t answer to quickly. Only two test labs (other than STLabs, where I worked) have answered these questions in a way that made me say “cool!” Perhaps you will be the third.]
Richard Allen says
How do you know your testers are good? How do you know they are good enough? These are interesting questions for me because I’m always trying to be ‘better’ and have never felt I’m good enough. What was the answer that made you say, that’s cool! The process by which we estimate is also interesting. Similar to what was said above, the information managers really want what asking, “When will testing be complete?” I often think, they really mean, when will we find all the defects that we don’t know about, document them, pass them to coders to fix, get them back, re-test and find the new bugs that were introduced by the bugs just fixed which we didn’t know about, find and report incorrect designs, redesign, recode and retest the new design and so on. Someone is quoted as saying “There are three kinds of lies: lies, damned lies and statistics.”. I would like to add another, IT project estimations!
[James’ Reply: The full answer is a bit long for this space. Here’s the quick version: The two test labs that impressed me were Quardev and LogiGear. In both cases the key ingredients were: a cognitively sophisticated model of testing that can be explained and drawn out on a whiteboard; off-the-job training in the form of classroom lecture and practical exercises; on-the-job mentoring; involvement in the testing community; shunning ISTQB and the other silly certification programs; a self-critical attitude about their own expertise. Instead of claiming “we hire the best”, as most labs do, they say “we don’t know if we’re the best, but here’s what we do to work toward that.”]
Medicus Hero says
Ask the high school students what an estimate is and to think of an example. Example would be: How long is the cafe table from one end to the other end?
In other words I can see how golf analogy relates to estimation 🙂
Example: How far did one hit the golf ball? Or how long is the golf club from top to the bottom?