I’m the opposite of an AI fanboy. I’ve been arguing against the grandiose claims of the AI industry for decades. But the days are over when I can make an occasional case against it and then go back to doing real work.
AI is now good enough that lots of foolish people will try to use it in testing, and lots of clever people expect me to know how it might be used properly in testing. These days, every professional tester must learn enough about it, and practice enough with it, to be able to defend against the inevitable pressure to leverage this new technology far beyond what is reasonable.
Therefore, my brother and I have started a peer conference series meant to explore the experiences of serious testers with AI and subject them to scrutiny. We are calling this the Workshops on AI in Testing (WAIT). Our goal is to help foster a community of testers who can credibly claim expertise in both testing AI and using AI in testing.
We held the first one on April 7-8, 2024. Here’s what happened.
Attendees
• Jon Bach
• James Bach
• Nate Custer
• Alexander Carlsson
• Michael Bolton
• Wayne Roseberry
• Blake Norrish
• Michael Cooper
• Julie Calboutin
• Ben Simo
• Rob Sabourin
• Abdul Faras
• Mahesh Kumararaju
• Pooja Agrawal
• Lalit Bhamare
This LAWST-style conference was held over Zoom, in the Pacific time zone. This was a bit of a problem for attendees from India, but we appreciated their presence, all the same.
Jon Bach facilitated. I was content owner (that means I was responsible to keeping people on topic and helping the best content to come out of the experience reports).
Experience Report #1, Nate Custer
Nate presented his skeptical experimentation about the effectiveness of self-healing locators in web-based automated checking. He showed us a thoughtfully designed experiment which seemed to show that self-healing locators are more trouble than they’re worth. However, James raised the issue of the “Raincoat Problem.” This is the fact that you can’t give a general answer to the question: “Should I pack a raincoat for this trip?” because it depends where you’re going and what you’re going to try do. A raincoat is completely useless in some situations and indispensable in others. Similarly, whether self-healing AI is worth the trouble depends on specific contextual factors. Instead of simply answering “yes self-healing locators are a good idea,” what we should do is identify and report on the factors that bear upon that decision. The raincoat problem is ever-present when evaluating any form of AI.
Nevertheless, Nate collected data in a reproducible way. He approached the experiment like a professional.
Experience Report #2, Michael Bolton
Michael shared his and James’ experiences doing a detailed analysis of an AI demo that had been performed by Jason Arbon and posted on Medium. Michael illustrated how our analysis was done and explained why debunking bad work is so much more time-consuming than creating bad work.
We discussed the prospects for LLMs to become substantially better at suggesting test ideas than they are today.
Experience Report #3, Alex Carlsson
Alex demonstrated how he uses ChatGPT to help him quickly format formal test procedures. His method doesn’t create the procedures, but sometimes gives him hints about how they might be improved. We discussed how this method can be responsibly used for projects and how it fit in with his other test activities.
Experience Report #4, Ben Simo
Ben showed a method for systematically producing test ideas using LLMs. His method is a multi-stage approach of analyzing information sources, identifying testable elements, and composing them into actional test ideas. This is part of ongoing research at his company. Discussion ranged around such things as scaling the approach, how to test if this system is giving good ideas, and how the system can be abused. We discussed ways of extending Ben’s ideas.
The question was raised about whether the effect of the tool is to train our minds to become better testers or to empty our minds and let AI do the work. It remains an open question how easily these tools can be abused.
Two mini-reports from Michael Cooper and Wayne Roseberry
With time running short, we heard abbreviated experience reports with a few quick comments from the group.
Michael Cooper showed a system for automatically categorizing test cases to make it easier to report testing against business needs.
Wayne Roseberry talked about an attempt to automatically report bugs.
We are planning WAIT #2 for May 18-19, 2024.
Chris says
Good to see someone is doing peer workshops again in the US and doing one on AI. How well did the online format work? Limiting at all?
[James’ Reply: It worked well for the content.
Yes, it’s limiting. In the old days we would dine together and get a chance to split off into smaller twos and threes. Zoom is a single-threaded system, so open socializing is much curtailed.]
Akshat Kumar says
Dear James,
Is developing Systematic Test Ideas with LLMs goes in the direction of defining SUT and breaking down requirements before prompting the LLM for test ideas?
If that’s the case, I would be happy to share my experiment where I continuously fine-tune my prompt(s) by providing the LLM with more information to increase its effectiveness. And reviewing the output with SMEs to verify it’s quality..
Looking forward to hearing from you.
Kind regards
[James’ Reply: Please send an email to peerconference@satisfice.com to officially apply. It sounds like you have an interesting experience. Please describe a little more about it in your mail.]