Back to all posts
The Civil War Inside Every Agent
AI & Automation 7 min read

The Civil War Inside Every Agent

Every agent system is fighting the same battle: learn to navigate human interfaces, or demand native ones. The industry is doing both. The question is who adapts to whom.

NC

Nino Chavez

Product Architect at commerce.com

This post has been challenged

by Gemini Deep Research

Read the counterpoint

Somewhere in London, a grocery store has no lights.

No music. No signage. No aisles designed for browsing. Robots move through the dark on LiDAR, picking orders from shelves that were never meant for human eyes. The building looks like a warehouse from the outside because that’s what it is—a warehouse that stopped pretending to be a store.

Meanwhile, in a lab somewhere, a humanoid robot is learning to push a cart through Kroger. Navigate narrow aisles. Reach for items on shelves built for five-foot-eight humans. Read price tags printed in twelve-point type.

Both solving the same problem in the same industry. Completely different bets on how machines should operate in the world.


Two Ways to Give a Machine Eyes

Every agent system I’ve built, observed, or written about is making the same choice—usually without realizing it.

Path A: Mimicry. Train the machine to navigate human-designed interfaces. Teach it to read screens, click buttons, fill forms, interpret layouts designed for human cognition. Perplexity’s Computer feature does this. Google’s Gemini auto-browse does this. Every “computer use” API does this. In the physical world, humanoid robots walking through grocery stores do this.

Path B: Machine-native. Redesign the interface so the machine doesn’t have to pretend to be human. Expose structured APIs. Publish tool contracts. Build MCP servers that tell agents exactly what’s callable and what isn’t. In the physical world, dark stores and lights-out factories do this.

The instinct is to pick a side. Native is cleaner, more efficient, more deterministic. Mimicry is messy, slow, brittle.

But the industry isn’t picking. It’s doing both. Simultaneously. And I keep trying to figure out whether that’s indecision or something more structural.


The Warehouse With No Lights

The physical world makes this split visceral in a way that software doesn’t.

Ocado’s Customer Fulfilment Centres run in near-darkness. Thousands of bots glide across a grid, grabbing items and assembling grocery orders in minutes. No climate control optimized for human comfort. No ergonomic considerations. No background music. The entire facility is designed from the ground up for machines, because when you remove the human from the loop, you can remove everything that was there to serve the human.

That’s the machine-native path taken to its logical end. And it works—Ocado’s system picks a fifty-item order in under five minutes with error rates humans can’t match.

But here’s the thing. Ocado didn’t convert existing grocery stores. They built new ones. From scratch. Purpose-built facilities on cheap land outside city centers.

The existing stores—the Krogers, the Tescos, the thousands of locations with narrow aisles and fluorescent lights and checkout lanes designed for people—those are still there. All of them. And they’re not going anywhere, because you can’t just turn off the lights in a building where humans still shop.

This is what every software team building agent systems is facing, whether they’ve named it or not. You can build machine-native interfaces for new systems. You have to build mimicry for everything that already exists.


Why I Keep Betting on Native

I should be honest about my bias here.

My entire stack is machine-native. My CLAUDE.md files are contracts—structured instructions that tell an agent exactly what it can do, what it should avoid, and how to verify its own work. When I build MCP tool servers, I’m publishing typed interfaces with explicit parameters and return values.

I didn’t arrive at this through theory. I arrived at it through a weekend debugging session.

I was building a volleyball stats app and spent hours trying to scrape data from a website. Two different approaches. Both fragile. Both fighting the DOM. Then I opened DevTools and found the site’s internal JSON API—structured, typed, comprehensive. Claude Code consumed it in minutes. What had taken hours through the human interface took five minutes through the machine-native one.

That experience calcified something for me. Every time I see an agent fighting a UI—clicking through dropdown menus, parsing pixel layouts, waiting for page loads—I think: there’s a JSON endpoint behind this. Someone just hasn’t exposed it yet.

And that’s probably where my blind spot lives.


Why Mimicry Isn’t Going Away

Because “someone just hasn’t exposed it yet” is doing a lot of work in that sentence.

The honest case for computer use, screen reading, and browser automation isn’t that they’re elegant. It’s that they’re unilateral. They don’t require cooperation from the system being automated. No API key. No partnership agreement. No one has to build anything for you.

Perplexity’s Computer feature can navigate any website. It doesn’t need the website’s permission. It doesn’t need a WebMCP manifest or an MCP server or a published tool contract. It just looks at the screen and figures it out—the way a human would, but faster.

That matters more than most machine-native advocates want to admit.

The brownfield reality is this: retrofitting the world’s existing infrastructure with machine-native interfaces would cost trillions and take decades. In the meantime, agents need to operate in the world as it is, not the world as architects would design it. Screen-reading agents are the bridge. And some bridges become permanent.


The Pattern I Missed for Twelve Months

I wrote about “unmarked roads” last June—the argument that your data isn’t ready for AI because it was optimized for humans, not machines. That was the machine-native argument applied to data infrastructure.

I wrote “Two Doors for Agents” three weeks ago. The argument: the industry is split between exploratory agents and contract-based agents—and the protocols are fragmenting.

Same question. Different angles. I just didn’t see the through-line until I started looking at dark stores.

The through-line is this: who adapts to whom?

The machine adapts to us, or we adapt to it. Both are happening. The ratio is what’s shifting.

Legacy data optimized for human search is one end of that spectrum—we built it for ourselves and expected machines to figure it out. Dark stores are the other end—physical space redesigned from the ground up so machines never have to pretend.

MCP servers, computer use features, screen-reading agents, WebMCP manifests—they all sit somewhere in between. And the whole spectrum is drifting, slowly and unevenly, toward the native end.


Where This Actually Goes

The civil war doesn’t end with a winner.

Dark stores didn’t replace grocery stores. They exist alongside them. Ocado runs its robotic warehouses and partners with brick-and-mortar chains. The lights-out factory handles volume; the human-staffed facility handles exceptions, edge cases, the long tail of products that don’t fit on a grid.

I keep waiting for the clean resolution—the moment where one path obviously wins and the other becomes legacy. It hasn’t come. And the longer I watch, the less I think it will.

What I see instead is a messy middle. Systems that start with mimicry and gradually expose native interfaces as agent traffic justifies the investment. Operations where some calls go through typed contracts and others still require a model squinting at a screenshot. Not elegant. Not by design. Just the shape that real infrastructure takes when it’s serving two masters at once.

But there’s a harder question underneath all of this. One I keep circling back to.

If we’re splitting investment, R&D, and standards work across two competing paradigms—teaching machines to navigate our world and rebuilding our world for machines—does either path get the critical mass it needs to actually work reliably?

That’s the next post.

Share:

More in AI & Automation