The role of AI agents in test automation – Will the future be autonomous?

What are AI agents?

What happens when AI doesn’t just respond to your questions, but takes initiative and acts autonomously? AI agents are changing what AI can do without us. But what are AI agents?

When you interact with ChatGPT or other chatbots you ask it your question and it will respond. AI agents, however, take the next step by autonomously taking actions, making decisions or interact with external systems.

An AI agent is a system that works autonomously to accomplish a goal.

At the AutomationSTAR conference in Berlin in 2023 I was inspired by Marcel Veselka on the role that AI agents can hold in the future of QA.

Since then I have been playing around with agentic AI and there have been some big developments. In this blogpost I’d like to explore what role autonomous AI agents can have in QA.

To keep the blog focussed on the possibilities I’m deliberately ignoring any ethical, security or other valid concerns for not introducing AI in your project 😃

We’ve progressed from manual tools, to automated processes and are now entering an era of intelligent and autonomous agents.

My previous agentic workflow demo

To show what these autonomous agents might do in the future of quality assurance, I created a demo last year that showcased a fully autonomous pull request review. It basically mimicked a simplified version of what a manual tester would do:

First, it figured out which test cases to run by reading the context of the pull request or a Jira issue.
Then, it actually performed these test cases on the preview environment using an autonomous browser agent.
Finally, it reported the results directly on the pull request.

https://github.com/tlolkema/autonomous-pr-test

Watch a demo video via the link in the original article. While the demo was very simplified it does show the potential of using autonomous agents in your development pipelines.

The current state of agentic AI systems

In the months following my demo, the major AI companies all began rolling out their agentic AI systems. Some examples are:

Anthropic came with the release of Computer use, Model Context Protocol (MCP) and recently Claude Code
OpenAI released Operator
Google released Project Mariner

Next to these there have been some exciting open source projects like:

Browser Use

OpenAI, Anthropic, Google, xAI and other players have all come out with their reasoning models. The idea is that the better the reasoning the better these models will be able to instruct AI agents.

Right now (March 2025), AI agents controlling computers or browsers aren’t quite at human level yet, but the list of tasks they can handle is growing every single day.

OpenAI Operator:
“In these benchmarks CUA achieved a 58.1% success rate on WebArena and an 87% success rate on WebVoyager for web-based tasks. While CUA achieves a high success rate on WebVoyager, where most tasks are relatively simple, CUA still needs more improvements to close the gap with human performance on more complex benchmarks like WebArena.”

There have been major advancements in AI agents.

Model Context Protocol (MCP)

One of the exciting developments in agentic AI is Anthropic’s Model Context Protocol (MCP). MCP aims to standardize how LLMs connect to various tools.

It’s not tied to just Anthropic’s Claude model or any specific LLM and it can communicate with a wide range of tools already.

An MCP server is basically a set of instructions for calling a specific tool. They’re very simple to create, which explains why there’s already a substantial list of supported MCP servers from both Anthropic and the community.

You’ll find MCP servers for automating tools like Puppeteer and Playwright, interacting with Git, connecting to search engines, or working with several popular APIs.

https://github.com/modelcontextprotocol/servers

“MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardised way to connect your devices to various peripherals and accessories, MCP provides a standardised way to connect AI models to different data sources and tools.”

Let’s take a look how the official Puppeteer MCP server looks like 🚀

https://github.com/modelcontextprotocol/servers/blob/main/src/puppeteer/index.ts

Following the MCP protocol we can define the tools the LLM can call.

In this example the click method of Puppeteer:

    name: "puppeteer_click",
    description: "Click an element on the page",
    inputSchema: {
      type: "object",
      properties: {
        selector: { type: "string", description: "CSS selector for element to click" },
      },
      required: ["selector"],
    },
  },

And the implementation of this click method:

    case "puppeteer_click":
      try {
        await page.click(args.selector);
        return {
          content: [{
            type: "text",
            text: `Clicked: ${args.selector}`,
          }],
          isError: false,
        };
      } catch (error) {
        return {
          content: [{
            type: "text",
            text: `Failed to click ${args.selector}: ${(error as Error).message}`,
          }],
          isError: true,
        };
      }

As you can see, creating an MCP server is actually pretty simple 😃

And there’s a good chance someone has already built one for your favourite tool!

Will MCP become the standard going forward? Or will we see more closed-source AI agents take over? That’s still up in the air. Only time will tell which approach wins out in the end.

Autonomous testing vs scripted test automation

Will autonomous testing fully replace scripted test automation? Not in the short term I think.

Autonomous systems will never be as fast, reproducible and accurate as scripted test automation. This is because AI agents continuously have to evaluate their actions and determine what their next step should be to achieve their goal.

While scripted automation follows a predetermined path with predictable timing and outcomes, AI agents must constantly process, analyze, and decide – impacting speed and consistency.

Because of this, predetermined scripts will always reach their goal faster and can be fully repeatable. These are important factors for automated regression testing.

I expect scripted test automation to remain the primary method for regression testing.

AI agents can, however, play a valuable role in creating these scripts themselves after a successful test to expand the existing test suite. They could observe their own successful test paths and generate optimized scripts.

Scripted test automation can go straight to the finish line. While autonomous AI agents will evaluate every step.

Where do autonomous AI agents shine?

AI agents really shine when testing new features.

Imagine this: you’ve just built a new feature and immediately several AI agents jump into action. One agent focuses on checking accessibility issues, another tests the core functionality, and a third looks specifically for security problems.

This happens way faster than manual testing in these areas and gives you quick feedback on what needs fixing.

AI agents are especially helpful for tricky automation areas like testing screen readers for accessibility. When there is a different page structure, automation around screen readers might fail when following a predefined set of instructions. An AI agent however can still evaluate whether this experience works properly by directly assessing the situation against accessibility guidelines rather than following rigid test steps.

Future

I think the role of QA engineers will evolve as more testing gets handled by AI agents. QA engineers will have to determine the quality of agent-based testing and design the integration of these agents into CI/CD pipelines.

In the near future, I expect we’ll see a mixture of types of testing.

Scripted test automation will continue handling regression testing where speed and reproducibility matter most.
AI agents will tackle complex exploratory scenarios and edge cases that traditional automation struggles with.
We’ll likely see a reduction in manual exploratory testing as agents become more capable of discovering unexpected issues.

View the original article here. Do you have any questions about this topic? Feel free to contact us!

Deel op LinkedIn