Using agent skills to write Playwright tests

How agent skills and instructions help me generate E2E tests without coding.

In my previous blog almost a year ago I predicted scripted test automation to stay key for regression testing while autonomous (browser) agents could take on a more exploratory role. We see a glimpse of these autonomous agents now in production.

This post focuses on scripted automation, because these scripts can now be fully written by AI agents. Agentic engineering has become incredibly powerful over the last year. Claude Code was released to the public in May 2025, and since then we’ve seen the introduction of skills, sub agents and multiple competitors like OpenCode or Codex. It’s here to stay, and it’s reshaping how we write code.

Working with AI agents is a skill in itself, and I hope this blog helps you develop it. Without direction, ‘vibe coding’ leads to suboptimal results.

Let’s look at the following example:

// What AI generated:
page.getByRole(”button”, { name: t(”checkout.payment_button”) }).click();

// What actually existed:
page.getByRole(”link”, { name: t(”checkout.complete_purchase”) }).click();

Wrong role. Wrong translation key.

Every piece looked correct: proper Playwright syntax, translation function, accessible role. The AI would generate good looking code based on what a checkout button should look like and what the translation key probably was but it didn’t check the actual implementation first.

Good thing is, we can do something about it 😃

By using a combination of good agent instructions, a detailed skill for writing E2E tests and the usage of hooks and self-validation the results will be much better.

What are agent skills?

Giving agents skills is like giving them a step by step process on how to build.

A skill is a reusable set of instructions that teaches an agent how to perform a specific task. It’s packaged as a folder containing a SKILL.md file with metadata (name + description) and the instructions themselves.

.claude/
└── skills/
    └── write-e2e-test/
        └── SKILL.md

Skills can be user invoked the same way as you might know “slash commands” or agent invoked. On startup, agents load only the name and description of each available skill, enough to know when it might be relevant.

Only when the skill is used the skill is loaded into the context, which makes it effective token-wise.

In the description, we define what the skill should be used for and when the agent can choose to use it:

---
name: write-e2e-test
description: Systematically write new Playwright E2E tests from planning through implementation. Use when the user asks to "write an e2e test", "create e2e tests", "add e2e test for [feature]", or mentions testing a feature with Playwright. Covers test planning, implementation verification and test validation.
---

In my skill, I will describe a step by step process from planning to implementation which will result in a working E2E test. I’m mostly triggering the skill myself but it’s also possible that the agent decides to use it.

“Agent Skills are a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows.”

Combine skills with AGENTS.md files

Skills handle specific tasks, but what about general project knowledge your agent needs everywhere? For instance any agent working in the project should understand the project structure, good and bad practices, how to run the application / tests etc. For this purpose we make use of AGENTS.md files.

Or if you’re using Claude Code CLAUDE.md files

We can nest these files as well, so your agent knows when to pull in instructions from specific AGENTS.md files if it’s working in a sub directory. Let’s say we have a monorepo which includes the frontend, the backend services and the e2e tests. Something like this:

├── frontend
│   ├── AGENTS.md          # Specific instructions for this package
│   └── src
├── backend
│   ├── graphql-api
│   │   ├── AGENTS.md      # Specific instructions for this package
│   │   └── src
│   └── auth-service
│       ├── AGENTS.md      # Specific instructions for this package
│       └── src
├── packages
│   └── e2e-tests
│       ├── AGENTS.md      # Specific instructions for this package
│       └── tests
└── AGENTS.md              # General instructions for the entire monorepo

We can put the general instructions in the root AGENTS.md file while we keep package specific instructions in the nested AGENTS.md files.

Tips for effective AGENTS.md files

These files shouldn’t be static. Update them continuously based on the behaviour of your agent. Don’t just put everything in it. When the context gets very large agents can lose the ability to keep track of what’s really important, it’s good to give enough context but don’t overdo it.

What I put in my AGENTS.md:

Test scope and overview
Code organization and structure
Testing conventions and best practices
Common patterns (waiting, state setup, locators)

The goal is giving your agent enough context to work autonomously without drowning it in instructions.

Think of AGENTS.md as a README for agents: a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project. https://agents.md/

How does the skill work?

Time for the fun part… let’s use our skill!

Back to agent skills! In the following overview I will go over my e2e test writing skill, explaining exactly how it works. I start off by letting the agent view my AGENTS.md file for the e2e tests package.

## Prerequisites

Before starting, read the e2e testing guidelines:

```bash
view packages/e2e-tests/AGENTS.md
```

Phase 1 – Test requirements

In the first phase I’d like to do some discovery together with the agent to set the requirements. I want it to ask me some questions which also help me think about how the test is best implemented.

I think a good understanding of what needs to be built is important to get right from the start.

## Workflow

### 1. Define Test Requirements

**Understand the feature**:

- Identify what needs to be tested
- List the specific test cases to write
- **IMPORTANT**: Always ask clarifying questions about the test scope and where to validate on

This is an important phase as aligning on the requirements will lead to a better plan.

**Example questions**:

- “Should this test be part of [spec file name] or a new spec file?”
- “This is a list of suggested tests to write, please select the ones you want to write”
- “Should I test both success and error cases?”
- “Should the test validate the product is visible on the cart page or check the backend state?”

Phase 2 – Research implementation details

In this phase I let the agent research the actual implementation details.

Without this, it will probably also do quite a good job in its plan mode, but helping the agent a bit by pointing to the actual files directly saves some extra tool calls and will make sure the right files are viewed.

One common mistake agents were making in my project was that the translation keys were made up. I fixed this by pointing to the actual translation file that it must view.

### 2. Verify Implementation Details

**Check actual implementation** in the monorepo to understand:

**Translations** (always check first):

```bash
view frontend/src/messages/nl.json
```

- Look for button labels, form fields, error messages, success messages

Next up, I want it to check the actual frontend components to identify the locators to be used by the test. If a locator isn’t great for accessibility or Playwright standards, I’d rather fix the actual component than use a suboptimal locator.

**Frontend components**:

```bash
view frontend/src/components/[feature-path]
```

- Identify UI elements, form fields, interactive elements
- Check for proper accessibility attributes (labels, roles, ARIA)
- Note any missing accessibility features

**Accessibility check**: If implementation lacks proper labels, roles, or ARIA attributes, inform the user and suggest fixes before writing tests.

Our application uses GraphQL. For my tests, I’m interested in which GraphQL calls are used in the feature so they can be properly awaited.

**GraphQL API calls**:

```bash
view backend/[service-name]
```

- Understand data flow and API endpoints
- Identify what data is sent/received
- Make sure to note down relevant GraphQL requests as these need to be waited on in the test by using `page.waitForResponse`.

For my tests, I set up the initial state via GraphQL calls. I’d like the agent to do the same for any new tests created using the API fixture.

**Optional: Setup initial state via API calls**:

```bash
view packages/e2e-tests/fixtures/api.ts
```

- Understand how to setup initial state via API calls
- Extend the `api` fixture with custom methods for setting up state for the test if needed. Base this on the existing GraphQL implementation.

By having these research steps in my skill, I know the agent will check all the essential parts to write a solid E2E test.

Phase 3 – Plan phase

In this phase, the agent presents its plan to me. Hopefully it’s already quite good because of the first two phases, but I can still tweak it before the actual implementation.

### 3. Get User Approval

**Present test plan** to user:

- List the specific tests you’ll write
- Explain what each test will cover
- Confirm approach aligns with their needs

**Wait for approval** before proceeding to implementation.

Phase 4 – Implementation phase

Time to build! Nothing special here. I just remind the agent of a few key instructions from AGENTS.md.

### 4. Implement the Test

**Location**: `packages/e2e-tests/tests/[feature-name]/`

- Use Gherkin steps (Given, When, Then)
- Set up initial state via API calls
- Use proper locator strategy (role > label > placeholder > text)
- Use `translateLocaleFn` for all text from translation files
- Group tests in descriptive `describe` blocks
- Keep tests parallelizable

Phase 5 – Validation phase

This phase is really important, as we ask the agent to do some self-validation and run the test in isolation a few times. This makes sure the test is actually working.

playwright test -g “Should [test title]” — repeat-each=5 — reporter=line

With the “-g“ option, we can run only a specific test, and by using the line reporter, we make sure the test feedback is fed back to the agent.

I also use a final checklist which lets the agent check the whole test again against the rules from earlier.

### 5. Run Test in Isolation

**Run the test in isolation multiple times** to verify stability:

```bash
pnpm --filter e2e-tests exec playwright test -g “Should [test title]” --repeat-each=5 --reporter=line
```

**If test fails**:

- Analyze the error
- Fix the issue
- Rerun the test
- Repeat until test passes

**Common issues**:

- Incorrect locators
- Missing waits for graphql requests
- Translation key mismatches
- Timing issues (avoid `waitForTimeout`)

### 6. Final Checklist

Before completing, verify:

- [ ] Test follows Gherkin structure (Given, When, Then)
- [ ] Initial state set up via API calls
- [ ] Relevant GraphQL requests are waited on by using `page.waitForResponse`
- [ ] Locators use proper strategy (role > label > placeholder > text)
- [ ] Avoid the use of `page.locator` to filter specific elements, use options of the getBy methods instead.
- [ ] All text uses `translateLocaleFn` for translation keys
- [ ] Test runs in isolation successfully
- [ ] No `waitForTimeout` or `networkidle` usage
- [ ] Test is inside descriptive `describe` block
- [ ] Test can run in parallel with other tests

Use hooks with strict ESLint rules

ESLint rules can act as a police man and force the agent in the right direction.

If it wasn’t clear yet, the whole point of agent instructions and a clear workflow is to bring more determinism into agentic coding.

This works especially well with strict project linting rules. The linter can already catch bad patterns, enforce project preferences and handle formatting rules. Hooks make this especially powerful, forcing the agent to follow these rules automatically.

Skills support hooks that can run at different phases. We can use these hooks to always perform linting before the result is presented to the user. This way, even if the AI agent generates code that doesn’t comply with your rules, it will fail the validation hook and the agent will automatically fix it before showing you the final result.

I recommend using the Playwright ESLint plugin with the recommended settings. You can even make these stricter. For instance, in my project I require test titles to always start with “Should”.

“playwright/valid-title”: [
“error”,
{
mustMatch: {
test: “^Should”,
},
},
],

The more we can already catch with linting, the easier it becomes to one-shot new tests or functionality with AI agents. If you see the agent write code you don’t like, see if you can make an ESLint rule for it 😃

At the top of the SKILL.md file, we can include the hook:


hooks:
  PostToolUse:
    - matcher: “Edit|Write”
      statusMessage: “Running lint checks...”
      hooks:
        - type: command
          command: “$CLAUDE_PROJECT_DIR/.claude/hooks/lint-e2e.sh”

ESLint will exit with code 1 if there are linting errors. However, we have to return exit code 2 if we want to treat it as a blocking error for our agent/skill to fix.

We can do that with a simple bash script:


#!/bin/bash
# Wrapper script to run ESLint and convert exit code 1 to exit code 2
# So agent / skill hooks treat lint errors as blocking errors # Capture both stdout and stderr OUTPUT=$(pnpm --filter e2e-tests lint 2>&1) EXIT_CODE=$? # Print output to stderr echo “$OUTPUT” >&2 # Convert ESLint’s exit code 1 (lint errors) to exit code 2 (blocking error) if [ $EXIT_CODE -eq 1 ]; then exit 2 fi exit $EXIT_CODE

Next steps for you!

Your codebase is different. Don’t copy my skill file and agent instructions exactly. But steal these patterns:

It starts with having clear requirements. A solid first phase leads to a better plan, which leads to a better implementation.
If you see the agent making the same mistakes over and over, e.g. translation keys or accessible roles, just point it in the right direction by giving the actual file paths.
Add self-validation. It’s really frustrating to get a test that doesn’t work when you run it. By adding a step for self-validation or by using hooks, we can get more determinism in an agentic skill.For inspiration view my full SKILL.md and AGENTS.md file here: https://github.com/tlolkema/skills-explained

Deel op LinkedIn