Building an AI-Driven E2E Testing Framework from Scratch
Testing was my bottleneck. So I built a framework where AI writes, runs, and validates the tests.
Nino Chavez
Product Architect at commerce.com
Shipping used to mean stress.
Especially when you’re the only person standing between the deploy button and a swarm of live users. But what if you could guarantee full regression coverage—with zero human oversight?
I just shipped a production-grade, autonomous end-to-end testing framework that touches every admin feature in my app, from the browser’s perspective, using nothing but AI tools and system-level architecture.
What I Built
I didn’t just write tests—I built a self-sustaining test architecture powered by Kilo AI that:
- Writes its own Playwright tests
- Heals itself when the UI or schema changes
- Executes multiple test tiers automatically (smoke → full regression)
- Validates every button, input, route, and display from a real user’s point of view
This isn’t about code coverage percentages. It’s about real confidence in every release.
How It Works
AI templates for test generation—Kilo analyzes components and writes E2E tests using reusable prompt blueprints. Self-healing maintenance—tests detect schema or UI changes and auto-regenerate with updated selectors and assertions. Zero-touch execution—with one command, the system spins up the environment, runs all test suites, and outputs full reports.
Under the hood: modular utilities (test-ids.ts with 318 IDs for stable selectors, TestModeContext for visual test states, admin-page-validators.ts with 378 lines of reusable logic). All admin routes modeled and abstracted for DRY test writing. Coverage spans functional, visual, accessibility, performance, cross-browser, and mobile.
The Shift
I used to treat QA as a cost center—something I had to do before shipping. Now I think about it differently. QA can be a strategic asset—powered by AI, maintained by code, enforced by design.
The framework is fully autonomous, AI-maintained, fast, reliable, and designed for scale. And most importantly, it’s running right now—in production—guarding every admin feature.
What I’m Still Figuring Out
The self-healing works for incremental changes, but big refactors still break things in ways the AI doesn’t anticipate. I’m experimenting with having the AI analyze git diffs before regenerating tests, so it understands the scope of what changed.
And there’s a trust question I haven’t fully answered: how do I know the AI-generated tests are actually testing what matters? I’ve added assertions to verify the tests themselves, which feels recursive in an uncomfortable way. But it’s working for now.
Originally Published on LinkedIn
This article was first published on my LinkedIn profile. Click below to view the original post and join the conversation.
View on LinkedIn