Richard drove up to the hangar just as I was checking the oil on the Husky, his prized baby float plane. Nuts. He was right on time. I was late. I’m supposed to have the plane ready to go when he arrives.
“Hey Dad, looks like a good day for flying. I’m just in the middle of the pre-flight.”
“Where are we going today?” He asked.
“I haven’t been up for a few months, so I figured just a sightseeing tour around the islands and then some pattern work at Friday Harbor.” I hate pattern work: landing and taking off while talking on the radio to the other pilots. That’s exactly why I need to do it, though. I must get over my nerves; must become a safe pilot. It’s a lesson from testing: focus on the risk.
“How much fuel do we need for that?”
“There’s about 18 or 20 gallons on board. That’s actually enough, but I figure it would be better to bring it to 35, just in case.”
“How much will you put in each tank, then?”
“7 gallons.”
“7 plus 7 plus 18 doesn’t add up to 35. Decide on the fuel you want and get that. If you’re going to fudge, fudge toward having more fuel. What are the four most useless things in the world?”
Oh I know this… “Um, altitude above you… runway behind you… gas in the gas truck, and… Um–”
“–and a tenth of a second ago” he finished. “But you remembered the important one. Gas. We don’t want that terrible feeling when they close the airport for 30 minutes to clean something up on the runway, and we don’t have the fuel to divert.”
I could have quibbled with him. What we would actually do in that situation is land 10 minutes away at Friday Harbor airport, or heck, anywhere, because we’re a float plane in the midst of an archipelago. But that’s not the point. The point was the habit of precision; of being conscious and calculated about the risks I take, wherever reasonable. Again this relates to testing. When I’m testing, the habit of precision comes out in considering and controlling the states of my test platforms and test data, and of knowing what to look for in test results.
Dad called the flight center for a briefing. He already knew what they’d say, since he always checked the weather before he left home, but Richard Bach is an especially fastidious pilot. He’s not exactly “by the book.” It’s more that he prides himself on safety. Nothing surprises him without itself being surprised at how prepared he is for it. Yes he knew you were coming; here’s your cake.
The Tester’s Attitude in the Air
My father’s philosophy dovetails perfectly with the tester’s mindset. We expect things to go wrong. We know we will be surprised at times, but we anticipate the kinds of surprises that might occur. That means we perform checks that might seem unnecessary at times. And we keep our eyes open.
I was almost done with the walkaround when he got off the phone.
“Three knots at zero six zero over at Friday,” he announced.
I paused to visualize, then tried to sound authoritative. “That’s a crosswind favoring runway three four.”
“Yes. We have the same conditions here.”
Cool. I got it right. I’m supposed to pretend to be the pilot. Officially, Dad is the pilot-in-command, but I do everything he would do, while he supervises and is ready to take over in case there’s a problem. While I’m doing the preflight, he’s doing it too, but not telling me anything– unless I miss something important. Each time we fly, I’m determined to find a problem with the aircraft that he hasn’t noticed, first. I haven’t yet succeeded.
“Dad, what’s this rust colored streak coming out of this bolt?” Yay, I found something! “There’s one of each side of the elevator.”
“Just a little bit of rust.” He smiled and materialized a can of WD-40 and blasted the bolts with it. This airplane is pristine, so even a little blemish of rust really stands out.
“Were you flying recently?”
“Yeah, I went out last week and splashed around at Lake Watcom.”
“That explains the streaks. Water spray on the tail. Did you pump out the floats afterward?”
“No, but I doubt there’s more than a pint of water in there.”
“Let’s see about that.” I retrieved the hand pump while he popped out the drain plugs. He was right again, I couldn’t suck out more than a cup of water from the floats, total, from all the compartments.
But there was something odd about the last one.
“This water is PINK, Dad!”
Now he was not smiling.
“Unless you landed at the rainbow lake on Unicorn Planet, there may be a hydraulic leak in there.”
He put his fingers in the residue and sniffed it like a bush guide on the trail of a white tiger. “Yeah, that’s what it is. Let’s pop the hatch and take a look.”
Testing With Open Expectations
This is an example of why good testing is inherently an open investigation. Yes, I had definite ideas of what I was testing for: leaky floats. Yes, I had a specific method in mind for performing that test. Had I not a specific method, I would have developed one on the fly. That’s the testing discipline. My oracles were judging the amount of water I was able to pump out of the floats compared to other occasions, and I also tasted the water a couple of times to detect if it was salty. It shouldn’t be, because it had been several flights since we had landed in salt water, but I check just in case there was a previously undetected leak from before. If salt water gets in there, we could have a serious corrosion problem.
I had no conscious intent to check the color of the water. But in testing we take every anomaly seriously. We take data in and ask ourselves, does this make sense? In that way, we are like secret service agents, trained to deal with known threats and to have a good chance to discern fresh and unusual threats, too.
The question “Can I explain everything that I see?” is a good heuristic to keep in mind while testing.
But if I were to have automated my float pump tests, I would never have found this problem. Because unlike a human, a program can’t look at something and just try to explain it. It must be told exactly what to look for.
I got an email, today…
Geoff confirmed the hydraulic leak at the connection in the left float, and will be sealing it, probably tomorrow. He’ll move the Husky to the big hangar to do the work. Nice, that you decided to pump the floats!
Dad
Anne-Marie says
What a lovely tale about you and your Dad.
I was wondering if the fact that he was your Dad affected the way you approached your ‘testing’? You mention that you were ‘determined’ to find something wrong with the plane. Do you think this is typical of your approach to life or is there a natural competition between you and your Dad?
Makes me think that perhaps who we work with affects how we test too!
Anne-Marie
[James’ Reply: It doesn’t occur to me to compete with him. I’m on the same team. I want to live up to the standard he embodies.]
Albert Gareev says
>> But if I were to have automated my float pump tests, I would never have found this problem. Because unlike a human, a program can’t look at something and just try to explain it. It must be told exactly what to look for.
Hi James,
You suddenly decided to pump the float (the action wasn’t really required) and discovered an issue that could have become a serious problem. That is a typical situation in testing but I don’t see why you [hypothetically] have problems imagining automation of float pump tests that would find unexpected/undefined problem. Below is the simplified illustration.
Layer One: Detection
I’m sure you’ve heard of Boolean algebra and I’m 100% confident you practically used elements of it as you did some programming.
[James’ Reply: This is how you’re going to start your argument? Okay. Why yes, Albert, I have heard of Boolean algebra. It came in handy when I was writing machine code for those video games I wrote for the Apple II, Commodore64, PC, and Amiga, back in the eighties. And indeed it’s essential in writing the test tools and automation that I create, today. I sure hope the point you are about to make is better informed about computer science than it is about my technical background…]
The main rule that you would need to use here is “FALSE = NOT TRUE”
Implementation would require heuristic too: instead of looking for problems automatic unit must look after the object’s (float, pump, and peripherals) state. If object’s state is deviated from the base one – that is an indicator of [possible] problem. State should be defined by a sufficient set of criteria: amount of water in a float, chemical activity of water (pH, electro-conductivity, etc), vibration, pump performance, – as much as required.
[James’ Reply: This fails in two ways to address the issue I raised, both of which you need to understand if you want to get good at test automation:
1. When I test sapiently, I do not need to bring to conscious attention all of the factors that I am indeed considering. However, when I automate, I must do so. You have made a list of factors that you have thought about. But you have not made a list of things you have not yet thought about, so this wouldn’t have solved the problem that I did indeed solved in this situation– namely, testing something that I hadn’t identified in advance as a potential issue.
2. The false positive problem. Sure, I’ve worked with automation that sqawks if something changes. But that’s not testing, that’s change detection. Once we notice a change we have to evaluate it. Then it becomes testing. The problem is, how can the automation judge which change reports will waste my time and which will not? This is exactly why, when I’m flying, I disable the landing gear alarm that comes on whenever I decelerate below 80 mph. That system assumes that I’m preparing for landing, but actually my cruising speed is 85 and my climbing speed is 70, so the alarm is constantly going off. The manufacturer who created that bit of automation didn’t know that.
There are certainly situations we both could cite where we would think that any change to a particular variable is highly likely to be bad. However, in my experience working with complex dynamic systems and complex testing thereof, those situations are not commonplace. Frequently it requires clever coding to mince our way around things that are fully expected to change. The more you write such code, the more expensive your automation becomes.]
//Note. That already could be sufficient enough, depending on context. Just light the red lamp for a pilot if a parameter changed to a value outside of defined range. However, we may end up with too many lamps or red lights indicating low threat that could have been ignored if checked by a human being. To avoid that problem an automatic checking unit should become a little smarter on its own and do evaluation.//
[James’ Reply: In the cockpit, I have a number of instruments that can indicate potential trouble, but just that indication– which is all automation can do– is not enough for excellent flying (or testing). There is no “everything is okay” detector on the plane. Nor can there be. The designers have anticipated various specific kinds of problems and I have detectors for those (a stall warning tone, for instance). But on top of all that– and here is the point I’m trying to make– my sapience must interpolate and extrapolate; generate ideas and filter them. The automation can help, but it cannot replace sapience.]
Layer Two: Evaluation
Boolean algebra is not sufficient enough here. Fuzzy Logic methods (or Artificial Neural Network for more complicated cases) are needed to be used.
[James’ Reply: I would argue that a natural neural network is needed.]
From the context of your story: small amount of water in a float isn’t a problem itself; a little salty water is also a valid case. That is, “true” or “false” state of event doesn’t give basis for evaluation. It’s combination of events and weight of each event that matter.
Examples.
Little water + high electro-conductivity = alert
Little water + high/low pH = alert
[James’ Reply: What is the false positive rate for that and wouldn’t it be easier just to taste the water? This also demonstrates my point about pink water: I don’t see anything here that detects pinkness. Here’s some homework for you. At minute 27 of this video, this doctor explains how a particular form of automation has made medical testing WORSE.]
//Note. It should be seen pretty clear now that implementation of such a solution isn’t a quick patch. Therefore it should be reasonable and affordable.//
From the safety and security stand points it could be reasonable even for a light aircraft. Economically it could be affordable if it’s sold along with 100 of planes. Or 100,000. Or one large plane.
[James’ Reply: But Albert, your very example suggests that it’s intractable, because you are either unwilling or unable to supply me with a complete list of things to check for and a complete design of the equipment to check for it. Oh, and how much will this equipment weigh? How expensive is it to test and maintain? Simplicity and lightness is important in designing light aircraft.
You haven’t addressed these concerns, nor would I expect you to, because my argument is that it’s ridiculously hard to do so, and not at all worth doing compared to training the pilot to handle these things himself.]
It also may lead to what I wouldn’t want to happen but what could happen and happens anyway.
For example – replacing a highly qualified mechanic with a combination “automatic unit” + “2-week trainee”. It could be because it’s too hard to get or too long to grow a professional, or because it’s much cheaper to employ 10 surrogates. Or in an organization with extremely high turnover of staff. That used to be only army in action but now seems to be any big corporation.
This approach has come and stayed, and it looks very attractive in accounting books. What is not visible in balance sheets that professional armed with automation can bring performance to new levels of excellence while amateurish workers even with automation have problems performing just as good.
So my point is that instead of trying to stop the progress maybe better to walk ahead of it?
[James’ Reply: What do you mean progress? I’m not standing in the way of any progress. But I would say that it is not progress to seek to do with automation expensively and poorly what can be done far better and far less expensively by a skilled tester.]
(Remember, Mars Pathfinder over-succeeded in 1997, working automatically, without direct radio guidance)
[James’ Reply: Yes, when I taught my testing class at JPL and saw the new rover under construction. That’s wonderful and extremely expensive technology. It also had a critical fault in it that almost ended the mission on the second day!
Dude, I’m not opposed to test automation. I’m opposed to wild irresponsible claims made about test automation. This is based on extensive experience with test automation that I have, as a programmer and as a tester and as a consultant who has written reports about test automation failures.]
Thank you,
Albert Gareev
http://automation-beyond.com/
Aaron Hodder says
Nice story. I’m insanely envious at your opportunity to go bush flying in such a beautiful part of the world. I’m stuck doing it in Flight Simulator 😉
“The question “Can I explain everything that I see?” is a good heuristic to keep in mind while testing.”
I find this heuristic to be valuable. In the past, I’ve seen an anomaly, say, “yesterday the program wasn’t sending me emails, but today it is.” It’s insanely easy to say “Well, it’s working now, so it passes, and that’s ok.” I always try and make a point now of asking “why was it not working yesterday, but today it is?” rather than just being content that it now works. The investigative path often leads to very interesting information, even if you never end up explaining the anomaly.
Daniel Åberg says
Hi James!
Great story today. Your one-liners such as:
“Unless you landed at the rainbow lake on Unicorn Planet, there may be a hydraulic leak in there.” or
\In that way, we are like secret service agents, trained to deal with known threats and to have a good chance to discern fresh and unusual threats, too.\
makes me smile all day in office.
Back to testing
Cheers
Albert Gareev says
Hi James,
This is the lesson to think about…
“Crash pilots given conflicting orders”
http://news.bbc.co.uk/2/hi/europe/2115425.stm
[quote]
German investigators have revealed that the pilots of the Russian airliner involved in last week’s mid-air collision with a cargo plane received contradictory instructions seconds before the crash.
They said voice recorders recovered from both aircraft showed Swiss air traffic controllers told the Russian pilots to descend, while the on-board warning system instructed them to climb.
All 69 people – including 45 schoolchildren – aboard the Russian Tu-154, and two crew members on the Boeing 757 were killed when the two aircraft collided at 35,000 feet (10,500 metres) over the German-Swiss border.
According to German authorities, cockpit warning systems told the Tu-154 to climb and the cargo jet to descend, just 45 seconds before the collision. But voice recorders reveal that one second later, Zurich air traffic controllers told the Russian pilots to descend. The Russian crew did not respond, so the Zurich control tower repeated the order 14 seconds later, investigators say.
The Russian plane responded and the two aircraft collided 30 seconds later.
Although the aircraft were flying over Germany at the time, they were under the control of the Swiss air traffic control body, Skyguide.
[/quote]
Thank you,
Albert Gareev
[James’ Reply: It is customary to call an example like this a lesson. But really, it’s not a lesson. This is a story. It becomes a lesson when we interpret it and make it a part of ourselves. My interpretation of this, I suspect, is different from your interpretation.
I see this as a lesson about the automation bias and the need for vigorous training in order to pilot aircraft safely.]
Michael Bolton says
Albert said, “State should be defined by a sufficient set of criteria: amount of water in a float, chemical activity of water (pH, electro-conductivity, etc), vibration, pump performance, – as much as required.”
Funny how even after James gave, in his story, explicit critical criteria that pointed to the problem—color, taste, smell—they still don’t appear on the list.
The argument is simple: unlike humans, automation doesn’t have expectations. It is capable of detecting matches or mismatches based on the expectations we feed it. Humans, by contrast, can identify expectations prospectively, in advance of an observation (“If this were to come to pass, it would indicate a problem”); on the fly, during an observation (“Hey… I didn’t expect that; that could be a problem”); and retrospectively (“Now that I’ve seen the problem, I can figure out how I might have detected it” or “Now that I have a better model, I realize that I could have tested for this, even though a problem hasn’t manifested itself yet.”).
Just light the red lamp for a pilot if a parameter changed to a value outside of defined range.
One indicator of a problem with any analysis is the appearance of the word “just”. Yes: lighting a red lamp is a simple enough thing. Knowing when to light it and when not to light it is the complicated bit. Interpreting it is also complicated. Here’s an example of a one-bit communications protocol: someone honks the horn at you, as you’re driving. How would one interpret accurately that without a walloping dose of context to go along with it?
—Michael B.
Armin Albarracin says
‘Dude, I’m not opposed to test automation. I’m opposed to wild irresponsible claims made about test automation. This is based on extensive experience with test automation that I have, as a programmer and as a tester and as a consultant who has written reports about test automation failures’
James, I really liked your statement above. Since years you seem like the one of the few serious voices advocating automation WHEN it makes sense – and not just everywhere it might. The ROI in automation is indeed an issue and should be taken more seriously. Everybody can sell automation – it sounds so nice specially to those not in the know – and the dream of the magic button.
Kerry says
What Albert misses also is that the automation would then itself need to be tested (and if he wanted to automate testing that then he’d be on the path to madness…)
Fionna O'Sullivan says
This is an interesting analogy for test automation, and a very good point. On the one hand, I’d say that it is important to be aware of what your automation system is actually testing and proving – water salinity and volume, say, but not that there aren’t any problems with the float pump.
On the other hand, though, (and I know nothing about planes) would you then have done the manual checks too? Or only done manual checks if the automated tests had flagged something. I try for the former, but am probably guilty of the latter too often for comfort.
A very good and timely reminder to keep alert!
[James’ Reply: A sapient pilot is always needed. Automation is fine as long as you don’t mind when it fails. We have lots of automation on our cars, for instance, but failure there generally leads to being stuck at the side of the road. Automation failure on aircraft is considerably more risky. That’s why we pilots are constantly watching and checking and cross-checking.]