The Outer Harness: Why the Real Work in AI Coding Agents Isn't the LLM
The inner harness (Claude Code, Codex) is commoditizing. Four developers shipped the same control plane primitives this week without a shared vocabulary — here's the framework that names what they built.
The agent you're running is not the interesting engineering problem. The control plane you build around it is.
That split has a name — inner harness vs outer harness — and understanding it changes how you architect everything from session management to approval gates to multi-surface dispatch. Last week, at least four independent developers shipped outer harness primitives without realizing they were converging on the same abstraction. This essay names the concept, gives it a taxonomy, and shows where the durable engineering work actually lives.
TL;DR: The inner harness (Claude Code, Codex, Open Code) is commoditizing fast. The outer harness — session persistence, feedforward controls, feedback controls, multi-surface dispatch — is where the durable engineering value accumulates. Feedforward controls shape agent behavior before it acts; feedback controls observe and respond after. Four indie tools independently converged on outer harness primitives this week without a shared vocabulary. That convergence is the proof the abstraction is real.
Why Developers Are Confused About What the "Remote Layer" Is For
Before getting to the framework, it's worth naming the symptom that motivated this essay.
A thread in r/ClaudeAI about Dispatch surfaced a reply that stuck with me: "I can't figure out what I'd actually use it for day to day. Most of what I do is already in Claude so why add the remote?"
That confusion is real and understandable. If you think of Claude Code as the product — the thing you use — then a remote layer looks like a redundant wrapper. The agent is already there. Why add indirection?
The confusion dissolves once you have the right mental model. The remote layer isn't a wrapper around Claude Code. It's the control plane for a long-running autonomous process that Claude Code happens to be executing inside. The agent isn't the product. The system you build to run agents reliably, safely, and across the gaps in your day — that's the product. Once you see that distinction, everything else follows.
Why Is the Inner Harness Commoditizing?
Paul Caplan articulated this directly in a discussion in r/ClaudeCode last week that generated significant discussion: "The inner harness is commoditizing and thinning. The interesting question is what you layer on top — which is the outer harness."
The inner harness is the agent runtime itself — the LLM, the tool execution loop, the SDK. Claude Code, Codex, and Open Code are all inner harnesses. They share the same fundamental architecture: read context, plan, select and execute tools, stream output. As WaveSpeedAI's architecture breakdown of Claude Code's agent harness makes clear, the structural patterns across implementations are converging rapidly. The gaps between agents are narrowing faster than most developers expect.
That's not a criticism of the agents. It's a natural consequence of a maturing abstraction. The agent layer runs code. It doesn't manage sessions, surface permission requests to humans, persist state across sleep cycles, or dispatch work across surfaces. Those problems live above it.
The outer harness is the control plane built around the agent: everything the inner harness doesn't ship with by default.
- Session persistence: keeping the agent alive and resumable when your laptop sleeps or network drops
- Feedforward controls: constraints injected before execution to shape what the agent is allowed to attempt
- Feedback controls: observation and surfacing of signals after (or during) execution
- Multi-surface dispatch: reaching the same agent session from different surfaces without rebuilding the workflow
The inner harness is commoditizing. The outer harness is where the value accumulates — because it's the part you have to build yourself.
What Are Feedforward and Feedback Controls in an Agent Harness?
The outer harness has two distinct control types. Conflating them produces bad architecture.
Feedforward controls (definition): controls that shape agent behavior before the agent acts. A CLAUDE.md file is a feedforward control. A system prompt restricting the agent to a specific directory is feedforward. Tool allowlists and blocklists are feedforward. Plan-vs-build mode selection is feedforward. Context injection — priming the agent with repo state, recent diffs, or task-specific constraints before execution starts — is feedforward.
Feedforward controls are injected into the agent's context before the run begins. They constrain the action space. The HumanLayer team, after a year of watching coding agents fail in every conceivable way, documented that the single biggest predictor of agent success is back-pressure verification — the agent's ability to verify its own work through tests and build checks. That's a feedback control. But the feedforward layer determines whether the agent is even attempting the right thing in the first place. Most teams underinvest in feedforward constraints and then spend engineering time on feedback controls trying to catch what the agent does wrong — when the cleaner fix is bounding what it's allowed to attempt.
Feedback controls (definition): controls that observe and respond after the agent acts. Notifications, status monitoring, approval gates, and audit logs are all feedback controls. They don't prevent actions — they observe execution, surface signals, and enable human intervention during or after a run.
Within feedback controls, there's a critical subtype worth naming separately: deterministic feedback — output that comes directly from tool execution rather than LLM interpretation. A bash exit code. A file diff. A test runner pass/fail. These are the most reliable signals in any outer harness because they can't be hallucinated. Our post on the permission layer as 98% of agent engineering explores how much architectural weight this layer actually carries.
The complete outer harness stack: feedforward constraints to bound the action space → execution → deterministic feedback to verify what actually happened → LLM-synthesized feedback to surface status to human and automated consumers.
What Convergent Evidence Shows the Outer Harness Is Real?
The strongest evidence that this is a real architectural layer — not just a taxonomy exercise — is convergent independent discovery. Last week, without a shared vocabulary, four separate developers shipped outer harness primitives.
Leo (thread): an SSH-accessible process supervisor for the Claude CLI. The builder describes it as enabling "long-running supervised claude processes, scheduled tasks, and ephemeral agents" with agent templates you can spawn at will and connect to over SSH. That's feedforward (templates, scheduling constraints) wired to a remote access surface. An outer harness, assembled from scratch.
ADHDev (r/SideProject, r/hermesagent): a browser dashboard with mobile support. The founder's framing is precise: "ADHDev sits above it as a control layer so I can monitor and continue sessions from a browser dashboard, including mobile." The problem it addresses: "I start a task, let it run, come back later, and need to know whether it is still working, waiting for input, stuck, finished, or ready for a follow-up." That is a feedback control layer. Outer harness.
tmux-notify (r/ClaudeCode): a hook-based notification and approval plugin for Claude Code. The author's pain: "I never knew which session was waiting for an Allow permission request or a Plan review." The solution surfaces deterministic feedback signals — permission prompts — to the human operator. Outer harness.
AIPass (r/AskVibecoders): a persistent-identity multi-agent local framework. The builder's core insight: "One agent on one project with persistent memory is already a different experience." Persistent memory is feedforward control — it shapes what the agent knows before it acts. Outer harness.
None of these teams used the term. All built the same layer. That's not coincidence. It's an abstraction being independently discovered because the inner harness ships without it, and every serious user eventually hits the same ceiling.
This pattern holds at the enterprise layer too. The same architectural moment that arrived for multi-cloud governance — where cloud-specific tools weren't enough and organizations needed a cross-cloud control plane to govern identity, policy, and posture consistently — has now arrived for AI-assisted development. Agents decide, the control plane governs. They are separate concerns that require separate engineering investment.
What Goes in Your Outer Harness
If you're building your own outer harness, here's the practical breakdown by control type. All of this works without any specific tooling — these are architecture patterns, not vendor recommendations.
Feedforward layer
CLAUDE.mdand system prompts: Define scope, constraints, coding standards, off-limits paths, and expected output format. The most accessible feedforward control and, in most teams, the most underbuilt. Invest here first.- Tool allowlists and blocklists: Control which tool invocations require human approval before execution. Bash commands, file writes, web fetches — each can be gated or blocked selectively.
- Plan-vs-build mode: Force a planning pass before the agent commits to execution. Supported natively by the Claude SDK via the
modefield. The planning step is a feedforward check you get for free if you wire it in. - Context injection: Prime the agent with repo state, recent diffs, or task-specific constraints before it starts. Shaped context produces shaped behavior — this is cheaper than debugging scope blowout after the fact.
Feedback layer
- Permission forwarding: When the agent hits a tool gate, the request goes somewhere. A tmux pane is the minimum viable version. A mobile notification with one-tap approve/deny is the production version. Our guide to human-in-the-loop approval gates covers the three-pattern stack that makes this reliable.
- Status monitoring: Real-time visibility into agent state — thinking, executing a tool, waiting for input, errored. The ADHDev problem ("is it working, waiting, stuck, or done?") is a feedback gap. Closing it is what makes agents feel like background workers rather than black boxes.
- Audit logs: Post-session verification of what the agent actually did vs. what you asked it to do. The post-run drift audit is where you catch silent scope creep before it compounds. One developer documented in r/ClaudeAI exactly what happens when this layer is absent: the agent rewrote an entire service when asked for a targeted change.
- Push notifications: Alerts when sessions complete, error, or need input. DIY path: webhooks to Slack or a custom endpoint. Production path: a native mobile app that renders permission requests with full context.
Session persistence
Both control types require a persistent execution environment underneath them. If the agent session dies when your laptop sleeps, none of the feedback layer matters — the agent is already gone. Session persistence is the substrate everything else runs on. The DIY path is a cloud VM + tmux + Tailscale. Leo's SSH-based process supervisor is a more structured version. The tradeoff is maintenance burden.
How Grass Ships the Outer Harness Pre-Built
The pattern across Leo, ADHDev, tmux-notify, and AIPass is consistent: each builder is spending real engineering time assembling outer harness primitives that are not their core product. The substrate work — VM setup, notification routing, permission forwarding, reconnect logic — is overhead on the way to the actual thing they're building.
That's the problem Grass addresses directly.
Grass is a machine built for AI coding agents — an always-on cloud VM with Claude Code, Codex, and Open Code pre-loaded, combined with a mobile-native control surface that ships the outer harness assembled rather than as a kit.
Here's how the taxonomy maps to what Grass provides:
Session persistence → always-on cloud VM. Powered by Daytona. Agent sessions don't die when your laptop closes. The execution environment is decoupled from your local machine by design, so long-running tasks survive sleep cycles, network interruptions, and context switches. BYOK — your API keys never touch Grass infrastructure.
Feedforward controls → composable on top of the persistent substrate. CLAUDE.md support, tool gating, and plan-vs-build mode are exposed at session start. The feedforward layer you've built works as-is; the VM is just where it runs reliably.
Permission forwarding → native mobile approval gates. When the agent hits a tool gate, the request surfaces as a native modal on iOS: tool name, syntax-highlighted preview of what will execute, one-tap approve or deny. Deterministic feedback flowing to the right surface, without building the notification routing yourself.
Status monitoring → real-time agent state streaming. Agent is thinking, running a tool, waiting for input — visible from the Grass mobile app from anywhere. The feedback loop that ADHDev and tmux-notify are each assembling independently.
Agent-agnostic architecture → Claude Code, Codex, and Open Code as first-class citizens. The outer harness doesn't pick sides on the inner harness. "One surface. Every agent. Always on." When the next agent ships, the control plane works without rebuilding your stack.
Multi-surface dispatch → laptop, phone, automation. The same session is reachable from MCP dispatch on your laptop, the native mobile app, or a scheduled/triggered automation. The session persists regardless of which surface you used to start it.
The operational difference between assembling this yourself and using Grass is the starting point. Instead of wiring together a process supervisor, a mobile notification layer, a permission forwarding system, and a persistent VM, you get all of that as the default configuration. For developers who want to compose custom outer harness components on top, the substrate is there. For developers who want to skip the assembly and start shipping, Grass is the pre-built version.
If you're already running agents on a cloud VM or looking to move off a sleep-prone laptop: Getting Started with Grass takes under 5 minutes. Free tier, 10 hours, no credit card.
FAQ
What is the outer harness in AI coding agents?
The outer harness is the control plane built around an AI coding agent — everything the agent runtime (Claude Code, Codex, Open Code) doesn't ship with by default. It includes session persistence, feedforward controls that constrain agent behavior before execution, feedback controls that observe and surface signals during and after execution, and multi-surface dispatch. The inner harness (the LLM and tool execution loop) is commoditizing; the outer harness is where durable value accumulates.
What is the difference between feedforward and feedback controls for coding agents?
Feedforward controls shape agent behavior before the agent acts: CLAUDE.md files, system prompts, tool allowlists and blocklists, plan-vs-build mode selection, context injection. Feedback controls observe and respond after execution: permission gates, status monitoring, audit logs, push notifications. Deterministic feedback — exit codes, file diffs, test results — is a subtype of feedback controls that comes directly from tool execution rather than LLM interpretation, making it the most reliable signal in any outer harness.
Why are developers independently building their own agent control layers?
Because the inner harness ships without the operational layer needed for serious production use. Long-running tasks require session persistence. Autonomous execution requires approval gates. Remote operation requires a notification and response system. Status visibility requires a monitoring layer. Developers are converging on these primitives independently because the need is universal and the default agent tooling doesn't address it — the four tools above (Leo, ADHDev, tmux-notify, AIPass) are all evidence of the same gap.
What should an outer harness for Claude Code include at minimum?
Session persistence (so tasks survive disconnects and sleep cycles), a feedforward layer (CLAUDE.md, tool gates, plan mode), permission forwarding (a mechanism to approve/deny tool invocations from wherever you are), status monitoring (real-time visibility into agent state), and deterministic post-run feedback (audit-quality logs of what the agent actually did). The permission layer architecture post and the approval gate implementation guide cover the implementation patterns in detail.
Does every developer need to build their own outer harness from scratch?
No. The DIY path — VM + tmux + Tailscale + custom notification layer — gives maximum control at significant ongoing maintenance cost. Modular tools like Leo and tmux-notify let you compose the outer harness from primitives. Pre-assembled options like Grass trade configurability for operational readiness. The right answer depends on how much of the control plane you want to own, maintain, and evolve versus inherit as a starting point.
Run your agents on an always-on cloud VM with the outer harness pre-built: codeongrass.com