Skip to main content
Matías Martínez Boylston

AI Direction

A Full WCAG 2.1 AA Audit, Run on a Multi-Agent Pipeline

The promise at stake: an aging, high-value customer who can't finish a task blocked by an accessibility defect.

Role
Head of UX — program owner and method designer
Context
Financial services; a design system and the product surfaces built on it
Timeframe
Phase 1 (design source) closed May 2026; Phase 2 (shipping code) completed June 2026 — accessibility is a 7-year throughline

Outcome recap

  • 270 findings, 19 components audited across design source and shipping code
  • Verification caught real gaps: of 18 re-checked audits, 15 were materially corrected
  • A live, self-accessible report — zero violations on its own automated scan

The challenge

A design system is the highest-leverage place in a product organization to fix — or silently multiply — accessibility defects. Every barrier baked into a shared button or modal propagates into every screen that uses it. This system had never been audited for accessibility, and the default culture treated it as a final-stage checkbox rather than a discipline owned up front.

The case for fixing it isn't abstract, and in this market it isn't primarily a legal one. Accessibility law binds only the public sector here, so the case has to be commercial: the customer base skews older, and the older, higher-value customer is exactly the person whom low-vision, motor, and cognitive barriers hit hardest. An accessibility defect is frequently also a conversion defect — a control you can't operate is a sale you don't make — and at scale it's brand and reputational risk. Underneath all of it: behind every finding is a real person who can't finish a task.

What I did

I took charge of a problem nobody owned. I stood up a full WCAG 2.1 AA program with a deliberate phase gate: Phase 1 audits the design source (Figma) to establish ground truth free of implementation noise; Phase 2 audits the real shipping code; then the two are reconciled. No mixed-surface shortcuts. Scope: all 19 components plus the underlying token foundations, audited on both surfaces.

The differentiator was the method. I designed the audit as a multi-agent pipeline: each component ran through an automated pass (which catches roughly a third of WCAG issues) plus a manual technique pass for everything automation misses — keyboard operability, focus order, name/role/value, overlay focus management, live regions, reflow, target size. Then an independent, adversarial verifier re-checked every report, prompted specifically to find what the first pass missed. The verification earned its place: of 18 re-checked audits, 15 were materially corrected — not rubber-stamped.

I synthesized rather than just listing. A reconciliation matrix compared design against code and surfaced a systemic root-cause analysis — for instance, a palette-only token system with no semantic role layer turned out to be the single architectural root behind most contrast failures. I shipped a deliverable a committee can actually use: executive summary, a human-impact layer, a design route and an engineering route, per-component detail grouped by who owns the fix, a full findings registry, and methodology.

I made the report practice what it audits — it's self-accessible by design, passing its own automated scan with zero violations across every page, fully keyboard-navigable with visible focus and AA contrast. It's a live artifact I can demo on the spot, built by a deterministic generator pipeline with its own release gate, so re-running it produces a versioned report with computed deltas.

When a real contrast defect reached production, I used it to write a durable team standard — "in UX, we are accessibility's line of defense" — made objective and measurable rather than debated. And I made the findings human: composite personas (age and ability archetypes) joined to the actual findings that would block them, translating 270 technical line-items into who they affect and why it matters.

Outcome & impact

A previously-unaudited design system now has a complete, two-surface WCAG 2.1 AA baseline: 270 findings, each attributed to a WCAG success criterion and to an owner. A prioritized remediation path, plus the systemic insight that a small number of root fixes resolve a disproportionate share of the severe findings. A repeatable, resumable method — the audit can be re-run to produce versioned deltas, turning accessibility into a tracked health metric rather than a one-time report. A culture shift: accessibility reframed as the team's own line of defense, backed by a standard anyone can apply without being an expert. A demoable, self-accessible artifact that earns executive attention because the medium reinforces the message. And it's genuine conviction, not performance — a 7-year throughline that goes back to launching a previous company's first accessible products.

Skills demonstrated

  • Accessibility (WCAG 2.1 AA)
  • Design-system governance
  • Multi-agent AI orchestration
  • QA method design (adversarial verification)
  • Audit synthesis and root-cause analysis
  • Human-centered framing (personas and prevalence data)
  • Commercial framing of UX
  • Building team standards and culture
  • Technical reporting

Proof / artifacts

[Image: Live HTML report — the versioned, offline-capable report itself: executive summary, human-impact personas, design and engineering routes, per-component detail, a full 270-finding registry, and methodology — placeholder]

[Image: Design-vs-code reconciliation summary — the synthesis view comparing what the design source promises against what actually shipped — placeholder]

[Image: Team accessibility standard — the "line of defense" note that turned one production defect into a lasting team practice — placeholder]

[Image: Multi-agent audit method diagram — how the automated pass, manual pass, and adversarial verifier fit together — placeholder]