agent-oversight

Inside-the-Loop vs. Outside-the-Loop: Evaluating Agent Architectures

Your agent ran. You have no idea what decisions it made along the way. That's not a trust problem — it's an architecture problem.

Sahil Kathpal

03 May 2026 • 13 min read

Inside-the-loop and outside-the-loop are the two architectural modes that determine whether your AI coding agent feels controllable or like a coin flip. An inside-the-loop agent exposes its plan before executing, pauses at explicit approval gates, and surfaces intermediate state so you can steer, redirect, or abort at any step. An outside-the-loop agent takes a task and runs to completion — returning either a result or a silent failure — with no intervention surface between dispatch and return. The distinction is not about model capability. It's about where human judgment enters the execution chain, and what happens when the model gets it wrong.

TL;DR: Inside-the-loop agents reliably ship real work on complex tasks because the human stays informed and in control at the decisions that matter. Outside-the-loop agents are safe only for narrow, fully-specified, reversible tasks — on anything else they fail silently, have no mechanism to refuse a bad task, and hand you something broken with no recourse. Design your oversight architecture based on blast radius and reversibility, not on how much you trust the model.

Why Does Agent Loop Architecture Matter More Than Model Choice?

The developer community has been converging on this framing organically. In a thread on r/AI_Agents that scored 12 and generated clear consensus, the conclusion was direct: "Inside works. See Claude Code, OpenCode — you see the plan, approve steps, stay in the loop. Ships real work. Outside — only narrow tasks. And it still can't tell you no."

That last clause is the structural insight. An outside-the-loop agent has no architectural mechanism to reject a task it shouldn't take, flag an ambiguity before it compounds, or surface the moment it's gone off course. It will attempt anything. When it fails, it fails silently — no checkpoint where the failure was catchable, just a diff you didn't ask for delivered at the end.

Developers running agents seriously — multi-hour tasks, parallel repos, production codebases — are independently arriving at the same answer: architecture is the oversight. As breyta.ai documents in their analysis of human-in-the-loop design for coding agents, the placement and granularity of approval checkpoints is the design decision that most determines real-world reliability, not model size or prompt quality. Swapping in a stronger model doesn't fix a missing approval gate.

What Is Inside-the-Loop vs. Outside-the-Loop?

Inside-the-loop — also called human-in-the-loop, plan-gated, or approval-gated — describes an agent architecture where the human has visibility and the ability to intervene at defined decision points during execution. The minimum viable inside-the-loop implementation has two properties: the agent's plan is visible before execution starts, and approval gates exist on high-risk tool calls.

Outside-the-loop — also called fully autonomous, fire-and-forget, or black-box — describes an agent architecture where the human dispatches a task and receives a result. The agent's internal sub-decisions, intermediate outputs, and state are opaque. The only surfaces are before dispatch and after completion.

An agent approval gate — a point where the agent halts and waits for explicit human confirmation before continuing — is the primitive building block of inside-the-loop architecture. Without at least one approval gate, you're outside the loop by definition.

"Inside the loop" is a spectrum, not a binary. An agent that shows its plan but auto-approves all tool calls is partially inside the loop. One that gates every single bash command is impractically inside the loop. The design question is where to place the gates — a question covered in depth in placement theory for AI approval gates — not whether to have them.

Some agent frameworks make plan approval a hard architectural constraint, not a feature toggle. Zerve's agent design requires explicit human plan approval before any code runs: the full workflow is shown and gated before execution begins. The inside-the-loop checkpoint isn't optional.

How Do Outside-the-Loop Agents Fail?

Outside-the-loop failures cluster into three categories that are qualitatively different from inside-the-loop failures — and harder to recover from.

Silent failure. The agent encounters an ambiguity — an unclear requirement, a missing dependency, a file in a different state than assumed — and makes a decision rather than surfacing a question. The decision might be wrong. You won't know until you review the output, which may be several hundred lines written against a wrong assumption. Inside-the-loop, this surfaces at plan review before anything is written.

Scope creep. The agent interprets the task more broadly than intended and modifies files you didn't ask it to touch. Outside-the-loop, you discover this in diff review after the fact. After spending a full day working alongside an AI coding agent, one experienced engineer documented the pattern directly: "This thing messes up all the time. It really is a dialogue. You can't just commit everything it creates. It'll need to be babysat." The babysat hours are almost entirely post-hoc review of outside-loop decisions the agent made autonomously.

The inability to refuse. This is the most structurally important failure mode. An outside-the-loop agent has no mechanism to flag a task as underspecified, risky, or contradictory. It will attempt the task regardless. An inside-the-loop agent surfaces ambiguities in the plan phase — before code is written or commands are run. The architecture gives the agent a surface to communicate uncertainty rather than silently resolving it wrong.

Codacy's analysis of independent quality gates for coding agents makes the structural point clearly: agents produce hardcoded secrets, unbounded loops, and hallucinated tool references not because they're poor models, but because they have no architectural reason to stop and check. The loop is what introduces that reason.

Evaluation Criteria: What to Measure Before You Choose

Dimension	Inside-the-Loop	Outside-the-Loop
Task reversibility	Works for irreversible steps — gates protect	Safe only for fully reversible tasks
Scope ambiguity	Surfaces at plan phase, before damage	Silently resolved — often wrong
Blast radius of an error	Bounded by gate placement	Bounded only by post-hoc review
Failure visibility	Visible, stoppable, addressable mid-run	Silent, discovered after the fact
Task complexity	Scales to multi-step, ambiguous work	Safe only for narrow, well-specified tasks
Human availability	Periodic check-ins at gates	Available only at submission and return
Post-run audit burden	Lower — issues caught mid-run	Higher — entire output must be verified
Total cycle time (complex tasks)	Slightly slower per gate	Faster dispatch; slower total cycle with rework

The key takeaway: outside-the-loop agents don't save time on complex tasks. They shift time from mid-run oversight to post-hoc review and rework — which is consistently more expensive. Gate overhead is front-loaded and predictable; rework overhead compounds.

Three Inside-the-Loop Patterns You Can Use Today

These three patterns are composable. Production workflows often combine all three, applied at different points in the execution chain.

Pattern 1: Approval Nodes

An approval node is an explicit checkpoint where the agent halts and waits for human confirmation before continuing. The CORE agentic workflow uses two: plan review before execution starts, and diff review before changes are committed. These two gates cover the majority of real-world failure modes without adding significant friction.

In Claude Code, approval nodes are implemented via canUseTool callbacks in the Agent SDK:

import { query } from "@anthropic-ai/claude-agent-sdk";

const HIGH_RISK_TOOLS = ["Bash", "Write", "Edit"]; // customize per project

const result = await query({
  prompt: task,
  options: {
    permissionMode: "default",
    canUseTool: async (toolName, input) => {
      if (HIGH_RISK_TOOLS.includes(toolName)) {
        return await requestHumanApproval(toolName, input);
      }
      return true; // auto-approve low-risk reads
    },
  },
});

Gate placement is calibrated to blast radius — not a blanket "approve everything" or "approve nothing" policy. In this production support triage workflow, an explicit human approval node handles medium-risk AI-generated customer replies: low-risk responses auto-approve, medium-risk ones gate, high-risk ones block entirely. The human is in the loop at the decisions that matter — not at every step.

Pattern 2: Judge Agents

A judge agent is a secondary AI agent that reviews the primary agent's output before it's accepted. The integrity-judge + sanity-judge pattern — shared by a team running agent orchestration at scale on r/Anthropic — spawns two judges per sub-task:

Integrity judge: checks factual correctness, validates that referenced files and tools exist, confirms tool inputs are well-formed
Sanity judge: checks scope adherence, flags unexpected changes, verifies the output matches the original task specification

async def execute_with_judges(task: str, primary_output: str) -> bool:
    integrity = await judge_agent(role="integrity", task=task, output=primary_output)
    sanity    = await judge_agent(role="sanity",    task=task, output=primary_output)

    if not (integrity.passed and sanity.passed):
        # Escalate to human engagement gate with judge reports attached
        await notify_human(integrity.report, sanity.report)
        return False
    return True

Judge agents add latency but reduce human review burden by catching structural errors — missing files, broken references, scope violations — before they reach an approval gate. Independent quality gate analysis shows that AI-reviewing-AI with a distinct evaluation role is structurally different from self-review, and defect catch rates reflect that difference.

The key framing: judge agents are a pre-filter that reduces how often approval nodes need to fire, not a replacement for them.

Pattern 3: Engagement Gates

An engagement gate is a checkpoint that requires the human to actively read and acknowledge before proceeding — not just tap allow or deny on a permission modal. The distinction matters because approval fatigue is real: in long-running sessions, humans rubber-stamp modals after the first few without reading them. An engagement gate forces a genuine pause by embedding substantive content that must be read to respond correctly.

The Tenet harness — built for managing long-running agent work and shared on r/SideProject — implements staged engagement gates: interview phase → mockup inspection → spec review → DAG job split → per-job critic evaluation. Each phase requires explicit acknowledgment. There is no fast-path through the gates without reading what the agent produced.

For rule-based encoding without SDK changes, CLAUDE.md engagement gates look like this:

# Engagement Gates

Before editing more than 3 files: list every file and the reason for the change, then stop.
If a task requires more than 5 tool calls: write a plan document first, then stop.
Before any git push: show the complete diff and wait for explicit "ship it".

These rules push the agent into inside-the-loop behavior without touching agent code. They're the lowest-friction entry point into gated architecture.

The Decision Tree: When to Use Which Architecture?

Apply this decision tree to any agent task before choosing your architecture:

Is the task irreversible? (git push, database writes, external API calls)
├── Yes → Inside-the-loop required. Gate the irreversible steps explicitly.
└── No → Is the task ambiguously specified?
         ├── Yes → Inside-the-loop required. Plan review surfaces the ambiguity.
         └── No → Is the blast radius of an error acceptable without review?
                  ├── Yes → Outside-the-loop may be acceptable.
                  └── No → Inside-the-loop required.

A practical heuristic: if you would be unhappy discovering the result an hour later with no ability to rewind, you need an inside-the-loop architecture. If you can run the task ten times and discard the bad results with minimal cost, outside-the-loop is acceptable.

For deeper guidance on building the approval gate layer correctly, the permission layer architecture post covers how the 98% of agent engineering that isn't the LLM — permission systems, hook composition, context management, subagent delegation — actually works in practice.

How Grass Makes This Workflow Better

The three patterns above work without Grass. canUseTool callbacks, judge agent spawning, and CLAUDE.md engagement gates are all tool-agnostic and run in any environment where you can reach a terminal.

But there's a structural problem with inside-the-loop architecture that Grass specifically solves: you have to be at your desk to handle the approval gates.

When a long-running agent hits an approval node at 11pm, during your commute, or between back-to-back meetings, you have three bad options: approve blindly from memory, let the session stall until you're back, or disable the gate and go outside the loop. All three undermine the architecture you designed.

Grass is a machine built for AI coding agents — an always-on cloud VM where Claude Code, Codex, and Open Code run continuously, reachable from anywhere. When an agent hits a permission_request — a bash command, a file write, a push — Grass forwards the approval gate to your phone as a native permission modal. You see the exact tool name and input with syntax highlighting, and tap Allow or Deny from wherever you are.

The Grass approval workflow closes the gap between the architecture you designed and the access you actually have:

Agent running on Grass cloud VM hits a canUseTool gate
SSE stream emits a permission_request event: { toolName, input, toolUseID }
Native iOS modal appears on your phone with a formatted preview of the tool call
You tap Allow or Deny; response sent via POST /sessions/:id/permission
Agent continues or aborts — decision logged with the session transcript

The agent isn't waiting at a stalled terminal. It's waiting on a cloud VM, and the approval gate is in your pocket. The inside-the-loop architecture you designed operates correctly even when you're not at your laptop.

For teams running the judge agent + approval node pattern across multiple repositories, Grass's /permissions/events SSE endpoint provides a global stream of all pending permissions across all active sessions — useful for surfacing any stalled agents from a single dashboard view without polling each session individually.

Try Grass at codeongrass.com — the first 10 hours are free, no credit card required.

Verdict

Inside-the-loop agents ship real work. Outside-the-loop agents are appropriate when the task is narrow, reversible, and well-specified — a subset of real coding work that is smaller than it appears in practice.

The three patterns — approval nodes, judge agents, and engagement gates — are composable and incrementally adoptable. Start with a plan review gate and a bash approval gate. Add judge agents when you're running multi-step workflows where output correctness matters. Add engagement gates when you notice approval fatigue on long-running sessions.

A better model doesn't compensate for a missing approval gate. The architecture is the oversight. Choose your loop configuration deliberately — before you choose your model.

Frequently Asked Questions

What is the difference between inside-the-loop and outside-the-loop agent architectures?

Inside-the-loop agents expose their plan before executing, pause at approval gates during execution, and surface intermediate state so the human can steer or abort at any point. Outside-the-loop agents receive a task and run to completion with no intervention surface between dispatch and result. The difference determines what failure modes are visible and recoverable versus silent and discovered late.

When is it safe to use an outside-the-loop agent?

Outside-the-loop is appropriate for tasks that are fully reversible, narrowly specified with no ambiguity, and carry acceptable blast radius if they produce a wrong result. Generating a draft, summarizing content, and running read-only analysis are reasonable cases. Writing files, running shell commands, pushing code, or calling external APIs each require at least one approval gate.

What is a judge agent and how does it fit into inside-the-loop architecture?

A judge agent is a secondary AI agent that reviews the primary agent's output before it's accepted. Common configurations spawn two judges per sub-task: an integrity judge (checking factual correctness, valid references, well-formed tool inputs) and a sanity judge (checking scope adherence and spec match). Judge agents reduce how often human approval gates need to fire — they're a pre-filter, not a replacement for human oversight.

How do engagement gates differ from approval nodes?

An approval node halts execution and asks for approve or deny on a specific action. An engagement gate requires the human to actively read and acknowledge substantive content before proceeding. Engagement gates address approval fatigue — the tendency for humans to rubber-stamp approval modals without reading them after the first few in a long-running session. The Tenet harness implements staged engagement gates across interview, mockup inspection, spec review, and per-job critic phases.

Can you implement inside-the-loop architecture without modifying agent code?

Partially. CLAUDE.md rules that enforce "show me the plan before editing more than 3 files" or "stop and write a plan document for tasks over 5 steps" implement plan-phase engagement gates without any code changes. For execution-phase approval gates on tool calls, you need a canUseTool callback (Claude Agent SDK) or equivalent hook mechanism. The CLAUDE.md approach handles plan-phase gates; the SDK callback handles per-tool gates during execution. Most production architectures use both.