Technical•11 min read

Components-as-Docs: A Practical Pattern for AI-Generated E2E Tests

Structured ARIA labels and component docs give AI coding agents everything they need to generate deterministic Playwright tests -- no vision models required.

Krishnanand B

February 8, 2026

AI coding agents can see your browser, click buttons, take screenshots. They have all the tools. But the best results I've seen don't come from agents that look at your UI. They come from agents that read about it.

Here's what I mean. When you give an AI agent a task -- say, "write a Playwright test for the checkout flow" -- it can approach that two ways. It can launch a browser, screenshot every page, and figure out what to click. Or it can read structured docs that describe your components, their ARIA contracts, and their automation APIs, then write a deterministic spec without opening a browser at all.

Both work. One costs pennies and runs in seconds. The other burns dollars and takes minutes. The difference comes down to how you give the agent context.

Vercel tested this directly. They gave agents a skill they could call to look up framework docs on demand. Then they tried a different approach: put those same docs into a file called agents.md that the agent reads on every turn -- no decision required, always there. The skill-based approach hit 53% accuracy. The passive docs? 100%.

Agents sometime fail at deciding when to look things up. They're very good at using what's already in front of them.

That finding landed differently for me, because I'd been building an automation system where the components themselves tell you how to test them -- through ARIA labels. It worked. But the missing piece was always: how does the AI agent know which automation components exist and how to call them?

And here's where it connects: a typical web app has a limited set of UI components. Maybe 20, maybe 50. Not thousands. That's a small enough library that you can put the full component reference into the agent's context. You don't need skills or retrieval. You just need structured docs in the right file.

I call this pattern Components-as-Docs. It's three layers working together: ARIA labels in your UI, structured docs for each automation component, and an index file that puts those docs into the agent's context automatically. The result is an AI agent that can generate correct Playwright specs from plain-English test cases -- without screenshots, without vision models, without per-run API costs.

ARIA Labels Are Semantic Contracts

This idea starts with a simple observation: screen readers and test automation ask the same question about every element on the page. "What is this, and what does it do?"

ARIA attributes answer that question. A button with aria-label="Submit order" tells any machine -- screen reader, Playwright script, AI agent -- exactly what it does. Not where it sits in the DOM. Not what CSS classes it has. What it means.

Playwright's getByRole() builds on this directly:

// Brittle: breaks when classes change
await page.click('.btn-primary.submit-form');

// Better, but arbitrary: breaks when someone drops the test ID
await page.click('[data-testid="submit-button"]');

// Semantic: survives refactors, redesigns, framework migrations
await page.getByRole('button', { name: 'Submit order' }).click();

The third approach is the locator that doesn't rot. It selects by meaning, not position. When the label changes, the behavior changed -- and that's exactly when your test should break.

Here's the same component with and without ARIA:

Without ARIA intent:

<div class="todo-item">
  <input type="checkbox" class="todo-checkbox" />
  <span class="todo-text">Buy groceries</span>
  <button class="btn-icon btn-edit">
    <svg><!-- pencil icon --></svg>
  </button>
  <button class="btn-icon btn-delete">
    <svg><!-- trash icon --></svg>
  </button>
</div>

Two icon-only buttons. Which is edit? Which is delete? A vision model has to guess. A CSS selector breaks when any class name changes.

With ARIA:

<li role="listitem" aria-labelledby="task-1-label">
  <input type="checkbox"
         aria-label='Mark "Buy groceries" as complete' />
  <span id="task-1-label">Buy groceries</span>
  <button aria-label='Edit "Buy groceries"'>
    <svg><!-- pencil icon --></svg>
  </button>
  <button aria-label='Delete "Buy groceries"'>
    <svg><!-- trash icon --></svg>
  </button>
</li>

Every element now has a unique, semantic name. The markup gained about 200 bytes. The automation went from fragile to deterministic. And screen reader users can actually use the app -- that part isn't a side effect, it's the primary purpose these attributes exist.

I'm not arguing against vision-based testing. For Canvas rendering, complex iFrames, and visual regression, it's the right tool. But for standard DOM-based UIs -- which is 90%+ of what most teams ship -- ARIA gives you a faster, cheaper, more reliable path that you've probably already started building.

The Architecture: Components-as-Docs

The pattern has three layers. Each one is simple on its own. Together, they give an AI agent everything it needs to write correct Playwright specs.

Want to explore this interactively? Enable Playground mode for this article by adding ?playground=true to the current URL.

Enable Playground Disable Playground

Layer 1: UI Components with ARIA Labels (The Data Layer)

Your React components ship with proper ARIA attributes. This is the foundation. A task form declares its purpose:

<form role="form" aria-label="Add new task">
  <input aria-label="New task description"
         placeholder="What needs to be done?" />
  <button aria-label="Add task">Add</button>
</form>

Layer 2: Structured Docs in `.docs/` (The Knowledge Layer)

Each automation component gets a documentation folder. This isn't prose -- it's structured reference that an AI agent can parse and act on:

.docs/atomic/button/
  button.md            # Quick start, factory pattern
  references/
    parameters.md    # RoleData: { name: string | RegExp }
    actions.md       # click, doubleClick, hover
    assertions.md    # expectVisible, expectEnabled
    wait-conditions.md
  examples/
    form-submission.md
    state-transitions.md

The doc for each component maps directly to the ARIA contract. Here's what the automation code looks like:

const btn = button(page);
await btn.click({ name: "Add task" });
await btn.expectEnabled({ name: "Submit" });

Every method takes a roleData parameter -- { name: "Add task" } -- which maps straight to the ARIA label. The factory pattern (button(page)) keeps things consistent across all components. Input, checkbox, task list -- same shape, same pattern.

const inp = input(page);
await inp.fill({ name: "Email" }, "test@example.com");

const chk = checkbox(page);
await chk.check({ name: "Agree to terms" });

The key rule: if it's not documented, the agent doesn't use it. Docs are the source of truth. Not the source code, not existing tests, not the agent's training data. Docs.

Layer 3: Index in CLAUDE.md (The Discovery Layer)

This is where Vercel's insight applies directly.

The .docs/ folder could sit there unused. An AI agent might find it, might not. That 53% accuracy from Vercel's skills-based approach? That's an agent that has to decide to search for something.

The fix is an index block embedded in CLAUDE.md (or AGENTS.md) that loads automatically when the agent starts:

<!-- COMPONENTS-DOCS-START -->
[Component Library Docs Index]
|root: ./.docs
|STOP. Read the docs before creating or modifying
 any component.
|atomic/button:{button.md}
|atomic/button/references:{actions.md,parameters.md}
|atomic/button/examples:{form-submission.md}
|atomic/input:{input.md}
|atomic/input/references:{actions.md,parameters.md}
|atomic/checkbox:{checkbox.md}
|atomic/checkbox/references:{actions.md,parameters.md}
|...
<!-- COMPONENTS-DOCS-END -->

The agent never has to "decide" to look up component docs. They're listed in its context from the moment it starts working. It sees the index, reads the relevant component docs, and generates specs using only documented methods and patterns.

The workflow looks like this:

AI agent reads CLAUDE.md -- sees the component docs index
Agent reads the test case in plain English
Agent reads the relevant component docs from .docs/
Agent generates a Playwright spec using documented patterns
Spec runs deterministically -- no AI in the loop at runtime

The LLM does the expensive thinking once at generation time. Tests run free forever after.

How It Works

Components as Docs

Three Layers, One System

ARIA labels feed docs. Docs feed the agent. The agent writes your tests.

🏷️

ARIA LabelsLayer 1

Components declare their purpose

aria-label="Submit order"

📄

.docs/ DirectoryLayer 2

Structured reference per component

parameters · actions · assertions

📋

CLAUDE.md IndexLayer 3

Always in the agent's context

Passive context → 100% accuracy

From Test Case to Playwright Spec

The agent reads your docs once. Tests run free forever.

📝Test CasePlain English

🤖AI AgentReads CLAUDE.md

📄Reads Docs.docs/atomic/*

✅Playwright SpecDeterministic

Generation:~$0.12

Per Run:$0.00

Accuracy:100%

Passive Context Wins

Vercel's research: agents with docs always loaded outperform agents that search for them.

Skills (On-Demand)

53%

Agent decides when to look up docs

agents.md (Always There)

100%

Docs loaded every turn

Source: Vercel Engineering

One Contract, Every Role

ARIA labels become the shared vocabulary across the team.

🏷️ARIA LabelsShared Contract

🎨

DesignSpecs ARIA labels

👩‍💻

DevComponents + docs

🧪

QATest cases

🤖

AI AgentGenerates specs

All roles synced through ARIA

The Team Playbook

The architecture is only worth something if a team can actually adopt it.

Why ARIA Consistency Is Easier Now

Before AI coding agents, ARIA inconsistency was understandable. Naming conventions lived in a wiki nobody read. One developer would write aria-label="close", another would write aria-label="Close dialog", a third would skip the label entirely. Human communication, things got missed.

AI agents change this equation. Not because they're magic, but because they're good at the boring, repetitive parts:

Linting in CI. eslint-plugin-jsx-a11y catches missing ARIA labels before code gets merged. This is the single highest-value step. It costs nothing to add and prevents the most common gaps.
AI-assisted PR review. Agents like Claude Code can flag naming inconsistencies during review: "This button uses 'close' but the existing pattern is 'Close [dialog name]'."
Auditing existing components. Point an AI agent at your component library and ask it to list every element missing an ARIA label. It'll find gaps a human would spend hours tracking down.
Design system defaults. If your shared Button component ships with a required aria-label prop, every team that uses it gets correct ARIA for free.

The Naming Convention

Good ARIA labels follow a pattern. Here's the one that works for automation:

Action + quoted target. Present tense verb, target in quotes for disambiguation.

'Mark "Buy groceries" as complete'
'Edit "Buy groceries"'
'Delete "Buy groceries"'
'Add task'
'Submit order'

The quoted target matters. When your app shows ten tasks, 'Delete "Buy groceries"' and 'Delete "Walk the dog"' are unambiguous. Just 'Delete' would match all ten delete buttons.

Who Owns What

Role	Responsibility
Design	Specs ARIA labels in Figma handoff
Dev	Implements ARIA per spec, writes component docs in `.docs/`
QA	Writes plain-English test cases referencing component contracts
AI agent	Generates Playwright specs from docs + test cases

This isn't a new process. It's adding one deliverable (ARIA label specs) to the design handoff and one folder (.docs/) to the codebase. Every other role does what it already does, just with a shared vocabulary.

Getting Started Checklist

You don't need to document every component on day one. Start small:

Add eslint-plugin-jsx-a11y to your CI. Catches missing labels automatically.
Audit your 10 most-tested components. Are the ARIA labels consistent? Descriptive? Unique?
Create .docs/ for those 10 components. Document the factory pattern, parameters, actions, and assertions.
Add the COMPONENTS-DOCS-START index to your CLAUDE.md. This is the step that makes docs ambient.
Write one test case in plain English. Let the AI generate the spec. See if it works. Adjust the docs where it doesn't.

That's a week of work for one engineer. After that, every new component follows the same pattern, and your AI agent gets better at generating specs as the docs grow.

The Payoff

One investment, three returns: accessibility compliance, stable test automation, design system documentation. The ARIA labels your team writes for screen readers are the same labels your automation reads. The docs you write for the AI agent double as onboarding material for new engineers.

The cost picture is straightforward. Vision-based testing costs $0.003-0.01 per action -- every screenshot, every inference call, every run. Components-as-Docs costs $0 per test run. You pay the LLM once to generate the spec, then run it thousands of times for free. Use LLMs for generation (one-time cost), not execution (recurring cost).

This approach suits most teams building standard web applications. If you're working with Canvas rendering, complex iFrames, or visual regression for pixel-perfect layouts, vision models are still the right tool for that 10%. For the other 90%, the answer is already in your DOM.

What You Can Do This Week

Don't wait for a full architecture redesign. Pick the smallest useful step:

Monday: Add eslint-plugin-jsx-a11y to your project. Run it. See what's missing. That gap list is your roadmap.

Wednesday: Take your most-tested component -- the one that breaks most often in E2E -- and write proper ARIA labels for it. Create a .docs/ entry with its automation API.

Friday: Write a test case for that component in plain English. Feed it to your AI coding agent with the component doc as context. See what it generates. The first spec won't be perfect. The third one will surprise you.

Bonus: If you use Claude Code or similar AI coding agents, check out Addy Osmani's accessibility skills on skills.sh. They're agent skills you can install to run accessibility audits directly from your editor -- a good way to find ARIA gaps in your existing components before you start writing docs.

If you've read Part 1, you know I believe code generation beats vision for the majority of test automation. This post is the architecture that makes code generation reliable: give the AI structured context about your components, and get deterministic Playwright specs in return.

Try Playground Mode

If you're reading this on the live site, you can turn on Playground mode for this article with one click:

Enable Playground Disable Playground

Enjoyed this article?

Get in touch to discuss how AI-powered testing can transform your QA processes.

Start a Conversation Read More Posts

ARIA Labels Are Semantic Contracts

The Architecture: Components-as-Docs

Layer 1: UI Components with ARIA Labels (The Data Layer)

Layer 2: Structured Docs in .docs/ (The Knowledge Layer)

Layer 3: Index in CLAUDE.md (The Discovery Layer)

Components as Docs

Three Layers, One System

From Test Case to Playwright Spec

Passive Context Wins

One Contract, Every Role

The Team Playbook

Why ARIA Consistency Is Easier Now

The Naming Convention

Who Owns What

Getting Started Checklist

The Payoff

What You Can Do This Week

Try Playground Mode

Enjoyed this article?

Components as Docs

Three Layers, One System

From Test Case to Playwright Spec

Passive Context Wins

One Contract, Every Role

Layer 2: Structured Docs in `.docs/` (The Knowledge Layer)