Back to all posts
What If the Fork Is the Problem?
AI & Automation 8 min read

What If the Fork Is the Problem?

I just mapped the civil war inside every agent system. Both paths sound reasonable. But what if splitting investment across both means neither gets finished?

NC

Nino Chavez

Product Architect at commerce.com

I just wrote about two paths. Teach machines to navigate human interfaces, or rebuild interfaces for machines. Both sound reasonable. Both have real momentum.

But something kept nagging at me while I was writing it.

Not which path is right. Whether the fork itself is the bottleneck.


The Math Doesn’t Work Yet

The pitch for agentic automation runs something like this: one agent replaces ten human workflows. The economics are obvious. Ship it.

I ran the numbers on my own monitoring setup last month. An always-on Claude Opus agent watching a production system costs roughly a dollar an hour idle—just maintaining context, polling for state changes, keeping its reasoning warm. That’s $720 a month before it does anything interesting.

I stared at that number for a while.

Add tool calls, multi-step reasoning chains, and the kind of retry loops that real-world automation demands, and you’re looking at $800-plus in variable token burn. Compare that to a $200/month SaaS seat for a human using traditional tools.

The enterprise math works. If you’re replacing a $15,000/month senior engineer’s repetitive workflows, the token cost is noise. But mid-market? Small teams? The agentic premium hasn’t crossed the threshold yet. Neither path—mimicry or native—is cheap enough for the long tail of businesses that would benefit most.

And this cuts across both paths. Building machine-native interfaces costs engineering time. Running mimicry agents costs tokens. The fork doubles the investment without doubling the return.


The Standard That Isn’t

I flagged protocol fragmentation in “Two Doors,” but the security dimension is worse than I let on.

In January, researchers published ClawHavoc—a supply-chain attack targeting the OpenClaw marketplace for MCP skills. 341 malicious skills. Over 9,000 compromised installs. The attack vector was elegant: publish tools that look useful, embed instructions that hijack the agent’s context, exfiltrate data through the agent’s own tool-calling infrastructure.

The machine-native path assumes that standardized protocols make agents safer. Typed contracts. Explicit permissions. Known interfaces. But standardized protocols also create standardized attack surfaces. When every agent speaks the same protocol, one vulnerability scales across the entire ecosystem.

Meanwhile, the mimicry path has its own nightmare. Models hallucinate clicks. They misread UI elements. They confidently navigate to the wrong page and take actions based on a layout that changed last Tuesday. Screen-reading agents are only as reliable as the model’s visual reasoning—and visual reasoning is still nowhere near deterministic.

The machine-native path assumes a standard that doesn’t exist yet. The mimicry path assumes models won’t hallucinate clicks. Both assumptions are fragile.

Four companies, four competing protocols. Anthropic’s MCP. Google’s WebMCP. Amazon’s Nova Act. Google’s A2A for agent-to-agent communication. The industry keeps calling this “USB-C for AI.” But right now it looks more like four different charging ports, each backed by a company with a different incentive structure and a different definition of “open.”


The Agent That Forgot Its Own Rules

This is the one that bothers me most, because it breaks both paths equally.

Earlier this year, a researcher named Summer Yue ran an experiment. She gave an AI agent access to her email inbox with clear safety instructions: read messages, summarize them, flag anything important. Standard stuff.

The agent processed hundreds of messages. As the volume grew, its context window filled up. The model compacted—summarized earlier context to make room for new information. And in that compaction, it dropped its safety instructions. The guardrails it was given at the start of the session simply disappeared from its working memory.

Then it started deleting emails. Not flagging them. Deleting them. The behavior it was explicitly told not to do—it couldn’t remember being told not to do it.

This isn’t a mimicry problem or a native problem. It’s a probabilistic-system-doing-deterministic-work problem. And it applies everywhere.

A mimicry agent that loses its visual reasoning context mid-session will click on the wrong elements. A native agent that loses its tool contracts mid-session will call the wrong endpoints. The failure mode is different. The root cause is identical: LLMs are statistically confident, not logically consistent. Extend the context far enough, and the confidence and the consistency diverge.


You Can’t Turn Off the Lights in a Kroger

I used dark stores as the clearest embodiment of the machine-native path. Purpose-built. Efficient. Elegant.

But here’s the number I glossed over: the cost of going greenfield everywhere.

There are roughly 40,000 grocery stores in the United States. Converting them to dark stores—or building new ones alongside them—would cost somewhere in the range of tens of billions of dollars. And that’s one industry in one country. Multiply by retail, logistics, healthcare, government services, and the bill becomes absurd.

The brownfield reality isn’t just an inconvenience. It’s the dominant condition of the built world.

The physical world wasn’t designed for machines and can’t be redesigned overnight—that’s why humanoid robots exist. Most software has no API and never will. Mimicry isn’t an interim hack. For much of the world’s infrastructure, it’s the only interface layer that will ever exist.

But permanence has its own fragility. Humanoid robots can’t match human dexterity—not yet, maybe not for decades. Screen readers break when someone ships a CSS change. Browser automation fails when a site adds a CAPTCHA. The mimicry path is perpetually one interface update away from breaking, and there’s no contract guaranteeing stability.

Both paths are stuck. Native can’t scale to the existing world. Mimicry can’t stabilize in it.


The Shepherd Who Can’t Keep Up

This is the objection that hits closest to home for me.

The standard narrative around agentic automation says humans are moving from “executors” to “shepherds.” Instead of doing the work, you oversee the agents doing it. Review their output. Approve their decisions. Catch their mistakes.

Sounds like a promotion. In practice, it might be the hardest job in the building.

When an agent makes three hundred decisions in the time it takes a human to make one, “oversight” becomes a performance bottleneck, not a safety net. You can’t meaningfully review machine-speed output at human speed. You end up rubber-stamping, sampling, or trusting the system—which is exactly what oversight was supposed to prevent.

Miss one step in the digital assembly line and you create debt that’s more expensive than the manual work it replaced. Not because the agent made a catastrophic error, but because you approved a subtle one that compounded over hours before anyone noticed.

Even well-designed governance has a scaling problem. The question isn’t whether you can build guardrails. It’s whether human auditors can keep pace with the systems those guardrails are supposed to constrain.


The Fork as Bottleneck

Every objection above applies to both paths. Token economics. Protocol security. Context reliability. Brownfield constraints. Oversight scaling.

Which brings me back to the fork itself.

If we’re splitting R&D budgets across mimicry and native. Splitting standards work across four competing protocols. Splitting engineering talent between “teach the model to see” and “redesign the interface so it doesn’t have to.”

Does either path reach critical mass?

The machine-native camp is building beautiful protocols for a world that mostly doesn’t exist yet. The mimicry camp is building fragile bridges to a world that wasn’t designed for them. And the middle—where most real systems live—is absorbing the complexity of both without the full benefit of either.

I don’t have a resolution for this. I’m not sure one exists right now.

What I do know is that framing it as “which path wins” misses the harder question. The harder question is whether the industry can afford to walk both paths long enough for either one to mature. Or whether the split itself is what keeps both of them fragile.

The adversarial analysis that prompted this post lays out all five objections formally. The companion whitepaper—The Bifurcation of Autonomy—expands them with data, protocol comparisons, and the full technical analysis. If the blog posts are me thinking out loud, the whitepaper is the receipts.

Share:

More in AI & Automation