Measure Business Impact, Not Model Scores

Forum|Forum|2 months ago
August 29, 2025
0 replies
28 views

+10

Matt.Stewart
Automation Anywhere Team

Another week has gone by and the conversation about AI and its potential for value are still top of mind, though this week the conversation has shifted in some places to a more extreme take: “AI doesn’t work at all”

These are usually accompanied by (often hilarious) images of conversations of AI saying or doing nonsensical things. There’s nearly infinite content out there for an “AI says the Darndest Things” YouTube channel (which you could probably automate WITH AI, ironically enough).

The reason I’m writing all this (quick aside, writing something manually is going to be our generation’s “I walked both ways uphill in the snow” trope) is because I wanted to respond to the idea that AI doesn’t work. Or rather I wanted to repost something I said 6 months ago in a video series I did here. The video series covered the necessary steps in moving an automation program into the world of Agentic Automation, and it came from years of experience in delivering automation. One major problem that has been exacerbated by AI is that people very rarely adequately define success metrics for whatever project they are doing. Here are the steps I usually saw:

1) Someone submits an idea for an automation project

2) Some research is done and the value-potential is identified

3) Some feasibility assessment is done and a decision is made

4) The project gets funded and proceeds

5) The project is delivered

Do these steps sound familiar? Are these the primary steps you take? Well, there are some major pieces missing here! First off, the obvious one, there should be a follow up step that determines if the declared value was achieved. Just delivering the project does not guarantee that the projected value will be there in part or in full. Marking it down as a win and patting yourselves on the back without that step can be actively harmful to your organization. Doing that follow up and being willing to turn off an automation that is not adding value is crucial to program maturity and sustained success.

The other one comes naturally from that one: “How do we measure the value? How do we determine success?”

Let me give an example using RAG (Retrieval Augmented Generation), since that is something well understood in the AI space right now, and is also something I have a nice slide on from my aforementioned video series.

I was at an organization that struggled to get a RAG solution into production for two main reasons. One important factor was low data quality for our knowledge base, but another potentially more important factor was that leaders could not agree on how we’d ‘test’ the solution. There were long discussions about testing methodologies, some bordered on deep epistemological conversations that were fun but wholly unproductive. Some were deeply technical exercises meant to quantify the predictions in the most detailed mathematical way possible. No progress was made and the project never made it to production.

What all those crazy testing exercises forgot to ask was this: “What was the point of the technology?” The point of our RAG system was to get the right answer into the hands of business users faster, because speed and quality of service were directly tied to most of the key performances of the business segment that would be using the technology. I invented a metric name called “Throughput” which sought to focus on business outcomes instead of just technical performance.

Here's that’s slide.

In this approach the question “Is the AI working?” isn’t focusing only on exactly which words it generated, which sources of data it found, or if it answered the question exactly right on the first try. It’s looking at the combination of human effort and AI tools and asking “Is this solving my business problem better than before?” Because isn’t that what matters? If you build an experience and tool set that helps you achieve your business goals faster and does so safely and without introducing any extra risk, does it matter if the end user had to ask the question twice?

To anyone looking at those studies or doomer articles saying that AI isn’t working or doesn’t add value, consider whether they are measuring the right things before making that claim. If you’ve been part of AI pilots that were declared a failure or abandoned before production, consider if you were focusing on the right problem.

If you are interested in the entire series, check it out here.

That slide comes from the course called “Testing Generative AI”.

Sign up

Login to the Pathfinder Community

Scanning file for viruses.

This file cannot be downloaded