AI Direction
A Full WCAG 2.1 AA Audit, Run on a Multi-Agent Pipeline
The promise at stake: an aging, high-value customer who can't finish a task blocked by an accessibility defect.
Outcome recap
- 270 findings, 19 components audited across design source and shipping code
- Verification caught real gaps: of 18 re-checked audits, 15 were materially corrected
- A live, self-accessible report — zero violations on its own automated scan
The challenge
A design system is the highest-leverage place in a product organization to fix — or silently multiply — accessibility defects. Every barrier baked into a shared button or modal propagates into every screen that uses it. This system had never been audited for accessibility, and the default culture treated it as a final-stage checkbox rather than a discipline owned up front.
The case for fixing it isn't abstract, and in this market it isn't primarily a legal one. Accessibility law binds only the public sector here, so the case has to be commercial: the customer base skews older, and the older, higher-value customer is exactly the person whom low-vision, motor, and cognitive barriers hit hardest. An accessibility defect is frequently also a conversion defect — a control you can't operate is a sale you don't make — and at scale it's brand and reputational risk. Underneath all of it: behind every finding is a real person who can't finish a task.
What I did
I took charge of a problem nobody owned. I stood up a full WCAG 2.1 AA program with a deliberate phase gate: Phase 1 audits the design source (Figma) to establish ground truth free of implementation noise; Phase 2 audits the real shipping code; then the two are reconciled. No mixed-surface shortcuts. Scope: all 19 components plus the underlying token foundations, audited on both surfaces.
The differentiator was the method. I designed the audit as a multi-agent pipeline: each component ran through an automated pass (which catches roughly a third of WCAG issues) plus a manual technique pass for everything automation misses — keyboard operability, focus order, name/role/value, overlay focus management, live regions, reflow, target size. Then an independent, adversarial verifier re-checked every report, prompted specifically to find what the first pass missed. The verification earned its place: of 18 re-checked audits, 15 were materially corrected — not rubber-stamped.
I synthesized rather than just listing. A reconciliation matrix compared design against code and surfaced a systemic root-cause analysis — for instance, a palette-only token system with no semantic role layer turned out to be the single architectural root behind most contrast failures. I shipped a deliverable a committee can actually use: executive summary, a human-impact layer, a design route and an engineering route, per-component detail grouped by who owns the fix, a full findings registry, and methodology.
I made the report practice what it audits — it's self-accessible by design, passing its own automated scan with zero violations across every page, fully keyboard-navigable with visible focus and AA contrast. It's a live artifact I can demo on the spot, built by a deterministic generator pipeline with its own release gate, so re-running it produces a versioned report with computed deltas.
When a real contrast defect reached production, I used it to write a durable team standard — "in UX, we are accessibility's line of defense" — made objective and measurable rather than debated. And I made the findings human: composite personas (age and ability archetypes) joined to the actual findings that would block them, translating 270 technical line-items into who they affect and why it matters.
Outcome & impact
A previously-unaudited design system now has a complete, two-surface WCAG 2.1 AA baseline: 270 findings, each attributed to a WCAG success criterion and to an owner. A prioritized remediation path, plus the systemic insight that a small number of root fixes resolve a disproportionate share of the severe findings. A repeatable, resumable method — the audit can be re-run to produce versioned deltas, turning accessibility into a tracked health metric rather than a one-time report. A culture shift: accessibility reframed as the team's own line of defense, backed by a standard anyone can apply without being an expert. A demoable, self-accessible artifact that earns executive attention because the medium reinforces the message. And it's genuine conviction, not performance — a 7-year throughline that goes back to launching a previous company's first accessible products.
Skills demonstrated
- Accessibility (WCAG 2.1 AA)
- Design-system governance
- Multi-agent AI orchestration
- QA method design (adversarial verification)
- Audit synthesis and root-cause analysis
- Human-centered framing (personas and prevalence data)
- Commercial framing of UX
- Building team standards and culture
- Technical reporting
Proof / artifacts
[Image: Live HTML report — the versioned, offline-capable report itself: executive summary, human-impact personas, design and engineering routes, per-component detail, a full 270-finding registry, and methodology — placeholder]
[Image: Design-vs-code reconciliation summary — the synthesis view comparing what the design source promises against what actually shipped — placeholder]
[Image: Team accessibility standard — the "line of defense" note that turned one production defect into a lasting team practice — placeholder]
[Image: Multi-agent audit method diagram — how the automated pass, manual pass, and adversarial verifier fit together — placeholder]