agent-oversight

Where to Gate Your AI Coding Agent: A 3-Checkpoint Framework

Most developers run zero approval gates on their AI coding agents. The other extreme — gating every tool call — just rebuilds a slow human workflow. Here's the minimal 3-checkpoint architecture that covers real risk without the noise.

Sahil Kathpal

03 May 2026 • 14 min read

An approval gate (also called a human checkpoint) is a deliberate pause point in an AI coding agent's execution where the agent stops, surfaces its current state, and waits for human confirmation before continuing. Most developers run zero gates and absorb the cost when something goes sideways. The opposite failure — approval prompts on every tool call — just rebuilds a slow human workflow. This tutorial shows you where the three minimum effective gates are, what belongs at each one, and how to implement them with patterns you can copy directly into your project.

TL;DR

Three gates cover the majority of meaningful risk without meaningful overhead:

Plan review gate — approve the agent's approach before it touches any files
Findings review gate — confirm what the agent discovered before it acts on it
Diff-before-push gate — inspect the full diff before any code leaves your machine

All three are implementable today using CLAUDE.md prompts and a shell function. No specialized tooling required.

Goal: A Minimal Effective Approval Architecture

By the end of this tutorial you'll have a working 3-checkpoint pipeline you can copy into your own agent workflow:

A plan review gate that catches architectural decisions the agent can't make alone
A findings review gate that surfaces unexpected complexity before execution starts
A diff-before-push gate that gives you a final veto before changes propagate

All three patterns work tool-agnostically with Claude Code, Codex, Open Code, or any agent that accepts a CLAUDE.md or system prompt.

Prerequisites

Claude Code, Codex, or another coding agent that accepts a CLAUDE.md or system prompt
A git repository for your project (required for Gate 3)
Optional (recommended): Grass for mobile approval forwarding — so gates don't idle your workflow when you're away from your desk

Why Most Developers Are Running Zero Effective Gates

The failure mode isn't choosing between gates and no gates — it's calibrating where they fire. A Claude-powered Cursor agent deleted an entire company's database and backups in 9 seconds — no approval prompt, no pause, no warning. That's the ungated extreme.

The overcorrected extreme is equally counterproductive: per-tool-call approval that fires 40 times per task. You're not automating anything — you've added a slow layer to a human workflow.

Human validation pipeline research at LlamaIndex frames the right model: "strategic review checkpoints that catch errors, validate accuracy, and ensure" human judgment lands at the right moment. Gates work when they surface moments that actually require human judgment — not when they interrupt every tool invocation.

A thread analyzing orchestration patterns on r/VibeCodeDevs put it precisely: "The human is doing the hard part: gathering context, writing the brief, noticing what is missing, deciding where judgment is actually needed." Your gate architecture should protect exactly those moments.

If you're new to the concept, what is an agent approval gate? — it's a point where an AI coding agent pauses and waits for you to confirm or deny before continuing. Well-designed gates are infrequent and high-signal.

Gate 1: The Plan Review Gate

What is a plan review gate?

The plan review gate fires before the agent writes a single line of code. The agent reads the relevant files, builds an understanding of the task, generates an implementation approach — then stops to surface that approach for review before executing it.

This is the highest-leverage gate because it catches architectural decisions and task ambiguities before they compound into code. As one developer reported from a minimal workflow discussion on r/AgentsOfAI: "The human gate mattered — the agent flagged two real engineering decisions it couldn't decide alone."

When should you trigger a plan review gate?

Trigger it on every non-trivial task — anything touching more than one file or involving an architectural decision. Skip it for single-file bug fixes with clearly scoped changes where there's no ambiguity.

Implementation: CLAUDE.md prompt pattern

Add the following to your project's CLAUDE.md:

## Planning Protocol

Before implementing any task that touches more than one file or requires an architectural decision:

1. Read the relevant files and understand the current structure
2. Write a plan that includes:
   - The exact files you plan to create, modify, or delete
   - Your implementation approach in 3-5 bullet points
   - Any decisions you cannot make alone (ambiguous requirements,
     performance tradeoffs, API design choices)
3. Output the plan, then append: `PLAN READY — waiting for approval`
4. Do not write any code or modify any files until you receive approval

When you receive approval, proceed with the plan as described.

Implementation: SDK-level gate with `canUseTool`

If you're driving Claude Code through the @anthropic-ai/claude-agent-sdk, enforce the gate programmatically:

import { query } from "@anthropic-ai/claude-agent-sdk";

let planApproved = false;
let planBuffer = ""; // accumulates agent text output before a write gate fires
const writingTools = new Set(["Write", "Edit", "MultiEdit", "Bash"]);

for await (const event of query({
  prompt: `${PLANNING_PREAMBLE}\n\n${userTask}`,
  canUseTool: async (tool) => {
    if (writingTools.has(tool.name) && !planApproved) {
      // planBuffer holds all text the agent output before attempting a write
      planApproved = await promptHumanApproval(planBuffer);
      return planApproved;
    }
    return true; // reads are always fine
  },
})) {
  // Accumulate plan text from the agent's output stream
  if (event.type === "text") planBuffer += event.text;
}

The SDK-level gate cannot be bypassed by the model — unlike a prompt instruction that drifts after many tool calls.

Real-world example: The CORE system

The CORE project formalizes this into a dedicated orchestration layer: spec → auto-generated plan → human approves → agent runs in a separate session → returns PR. The key design insight: approval happens at the plan boundary, not mid-execution. After approval, the agent runs in a clean session without further interruption — keeping human judgment focused on the one question worth asking: "Is this the right approach?"

Gate 2: The Findings Review Gate

What is a findings review gate?

The findings review gate fires after the agent has explored the codebase but before it starts making changes. It's the most commonly skipped gate — and the most underrated.

Agents frequently discover things during exploration that materially change the nature of the task: a missing migration, an undocumented dependency, a function called from three places instead of one. The findings gate surfaces this before execution, rather than burying it in the commit history three hours later.

As human-in-the-loop research from Orkes frames it: the right moment to add a human checkpoint is where automated decision-making requires context that only the human has. The findings gate is exactly that inflection point — after the agent knows what's there, before it decides what to do with it.

When should you trigger a findings review gate?

Trigger it on tasks that involve understanding existing code before changing it: refactors, bug investigations, feature extensions into unfamiliar codebases. Skip it for greenfield tasks where the agent is building from scratch with no existing code to navigate.

Implementation: CLAUDE.md prompt pattern

## Findings Protocol

After reading the codebase and before making any changes:

1. Summarize what you found:
   - Current state of the relevant code
   - Anything surprising, undocumented, or potentially risky
   - Unexpected dependencies or callers you didn't anticipate
2. State what you now plan to do, given what you found
3. Explicitly flag anything that changes your original plan
4. Output: `FINDINGS READY — waiting for approval`
5. Do not modify any files until you receive approval

How to verify the findings gate is doing its job

After the agent outputs FINDINGS READY, check:

Does the summary mention anything that changes your original plan?
Did the agent surface dependencies or callers you weren't aware of?
Are there risks or scope changes worth acting on before execution starts?

If you're consistently approving without reading the findings output, the gate has decayed into noise. Either the tasks are too small to warrant it, or your codebase is well-documented enough that the agent never surfaces surprises. Both are good problems to have.

Gate 3: The Diff-Before-Push Gate

What is a diff-before-push gate?

The diff-before-push gate fires after the agent completes its implementation, before any changes are committed or pushed. It's a final veto on the actual code produced — not the plan, not the findings summary, but the implementation itself.

This gate pairs naturally with a structured diff review workflow — checking scope bounds, unexpected file modifications, and test coverage before changes propagate.

When should you trigger a diff-before-push gate?

Every time. Unconditionally. Even on trivial tasks, a 30-second diff scan catches "the agent modified something it wasn't supposed to."

Implementation: Shell function

function agent-diff-gate() {
  echo "=== Agent Diff Review ==="
  git diff HEAD --stat
  echo ""
  echo "Modified files:"
  git diff HEAD --name-only
  echo ""

  read -p "View full diff? (y/N) " show_diff
  if [[ "$show_diff" == [yY] ]]; then
    git diff HEAD
  fi
  echo ""

  read -p "Approve and commit? (y/N) " approve
  if [[ "$approve" == [yY] ]]; then
    git add -A
    git commit -m "[agent] $(git diff HEAD --stat | head -1)"
    echo "Committed."
  else
    echo "Changes not committed. Run 'git checkout .' to discard."
  fi
}

What to look for in the diff

Scope: Did the agent touch files outside the scope of the task?
Unexpected deletions: Any files removed that you didn't ask to remove?
Hardcoded values: Credentials, environment-specific URLs, or secrets that shouldn't be in source
Missing tests: Did the agent add implementation without corresponding test coverage?

The post-run audit approach covers a more thorough checklist if you want systematic post-session verification on top of the diff gate.

How to Assemble All Three Gates Into a Pipeline

Here's the full workflow as a sequence:

Task
  → [PLAN GATE]     human reviews approach
  → Agent explores codebase
  → [FINDINGS GATE] human reviews discoveries
  → Agent implements
  → [DIFF GATE]     human reviews actual code
  → Commit

Each gate answers a different question at a different moment:

Gate	When it fires	Question it answers
Plan review	Before any reads or writes	Is this the right approach?
Findings review	After exploration, before changes	Does what the agent found change the plan?
Diff review	After implementation, before commit	Is the actual code what I expected?

Complete CLAUDE.md template

Copy this into your project's CLAUDE.md:

## Agent Workflow Protocol

This agent follows a 3-gate workflow for all non-trivial tasks.

### Gate 1: Plan Review
Before writing any code:
1. Read relevant files and understand the task
2. Write a plan: exact files to change, approach, decisions you can't make alone
3. Output: `PLAN READY — waiting for approval`
4. Wait for explicit approval before proceeding

### Gate 2: Findings Review
After reading the codebase, before making changes:
1. Summarize what you found
2. Flag anything that changes or complicates your original plan
3. Output: `FINDINGS READY — waiting for approval`
4. Wait for explicit approval before making changes

### Gate 3: Implementation Complete
When your implementation is done:
1. List all files you modified
2. Output: `IMPLEMENTATION COMPLETE — please review diff before committing`
3. Do not commit or push — wait for the human to run the diff gate

Verification: Is Your Gate Architecture Actually Working?

A gate architecture is working when:

The agent actually stops — it pauses at each gate rather than proceeding through it
The surface is useful — the plan, findings, and diff contain information that would have changed your decisions if you'd missed it
The approval rate is high but not 100% — if you're approving every gate without reading, they've become noise; if you're frequently rejecting, something upstream is broken

A 25-agent constitutional system shared in r/ClaudeCode — where agents deliberate in PR comments and a human provides final approval — found their approval rate was "mostly approve." That's the right signal. Gates should rarely need to block, but when they do, the block should matter.

Elementum AI's analysis of agentic governance reinforces where this pattern fits: "anything that can materially impact quality assurance or production should pass through human review and an auditable approval process." The three gates cover exactly that surface.

Troubleshooting: Common Gate Failures

The agent skips the gate and proceeds anyway

Gate instructions buried deep in CLAUDE.md get de-prioritized after many tool calls. Move gate instructions to the top of the file. This isn't a configuration issue — why your Claude agent ignores rules past ~15 tool calls explains the context drift mechanics. For enforcement-critical gates, use canUseTool callbacks at the SDK level rather than relying on prompt compliance; the SDK-level gate cannot be bypassed by the model.

The plan or findings output is too vague to be useful

Tighten the prompt. Require specific structured outputs: "List the exact file paths you plan to modify" rather than "describe your plan." The more constrained the required output format, the more extractable the signal.

You're approving too fast without reading

Add explicit friction to the approval step — require typing approve rather than pressing Enter, or surface the plan in a formatted block before showing the prompt. If gates have become rubber stamps, they're firing at the wrong granularity.

Gates work locally but block in CI

CI pipelines need non-interactive approval flows. Use an environment variable to auto-approve in automated contexts while preserving interactivity locally:

# CI — skip gates
AGENT_GATE_MODE=auto claude --prompt "$TASK"

# Local — interactive gates
AGENT_GATE_MODE=interactive claude --prompt "$TASK"

Check GATE_MODE in your CLAUDE.md preamble to branch behavior accordingly.

How Grass Makes This Workflow Better

The 3-gate framework works tool-agnostically on any machine. But there's an operational gap it doesn't address: what happens when your agent hits Gate 1 and you're not at your desk?

If the agent runs on your laptop and pauses at a plan review gate while you're in a meeting, you have two bad choices: let the session idle until you return, or skip the gate to keep momentum. Neither preserves the value of the gate.

Grass solves this with mobile approval forwarding. Your agent runs on an always-on cloud VM, and permission requests — including gate pauses — forward to your phone in real time via native modals.

How the 3-gate workflow runs with Grass:

Fire off a task from your phone or laptop — the agent starts on the cloud VM
The agent hits Gate 1 and outputs its plan — Grass surfaces this in the mobile app
You tap Allow or Deny from your phone — the agent continues on the VM without waiting for you to return to a desk
The findings gate fires mid-session — another notification, another tap
The diff gate fires at completion — you review the full diff in the built-in diff viewer: syntax highlighted, color-coded additions and deletions, file-by-file

The practical result: multi-hour tasks with full gate coverage, all approval checkpoints handled from your phone. The VM stays alive throughout — sessions survive disconnects and reconnect picks up exactly where you left off.

Setup:

npm install -g @grass-ai/ide
cd ~/your-project
grass start
# Scan the QR code with the Grass iOS app

Your gate-enabled CLAUDE.md works unchanged. Grass wraps the workflow at the infrastructure layer — agents use your own API key (BYOK, never touches Grass), run in your project directory, and forward permission prompts to your phone. Free tier is 10 hours, no credit card required → codeongrass.com.

FAQ

How many approval gates does an AI coding agent workflow actually need?

Three is the practical minimum for meaningful coverage without significant overhead: plan review before execution, findings review after exploration, and diff review before committing. More than three usually means gates are firing at the wrong granularity — per-tool-call approval almost always defeats the speed benefit of using an agent in the first place.

What should I look for at the plan review gate?

Three things: (1) whether the approach is correct, (2) whether the agent flagged any architectural decisions it can't make alone, and (3) whether the scope is right — the agent might plan to change more or fewer files than you intended. The plan review gate is the cheapest possible moment to redirect a task; catch it here rather than after hours of execution.

What is the difference between a plan review gate and a findings review gate?

The plan review gate fires before the agent reads anything — it approves the intended approach. The findings review gate fires after the agent has explored the codebase but before it makes changes. The findings gate catches situations where exploration revealed something that changes the plan: an undocumented dependency, a function with unexpected callers, a required migration that wasn't in scope. Without the findings gate, the agent proceeds on its original plan even when the codebase contradicts it.

How do I prevent my agent from bypassing approval gates?

Put gate instructions at the top of your CLAUDE.md — not buried in a section the agent reads once and effectively forgets. Use explicit sentinel phrases (PLAN READY — waiting for approval) as required outputs. For enforcement-critical gates, use canUseTool callbacks in the SDK rather than relying on prompt compliance; the SDK-level gate cannot be bypassed by the model regardless of context length.

Does adding three gates meaningfully slow down an AI coding agent workflow?

Not in practice. A plan review takes under 60 seconds to read and approve on a typical task. The findings review is comparable. The diff review scales with the size of the change but is usually under two minutes. Total gate overhead on a multi-hour agent task is rarely more than five minutes — and catching a wrong approach at the plan gate saves hours of execution time and the reversal cost.

Next Steps

Copy the 3-gate CLAUDE.md template above into one project and run a real task through it — time the actual gate overhead to build a baseline
For SDK-driven workflows, implement the canUseTool enforcement pattern for Gate 1 so gate compliance is guaranteed, not prompt-dependent
If you want full gate coverage while away from your desk — without letting sessions idle — set up Grass for mobile approval forwarding at codeongrass.com

For the technical enforcement mechanics underneath the gate patterns — PreToolUse hooks, ThumbGate blocklists, and SDK-level gating in depth — see How to Build Human-in-the-Loop Approval Gates for AI Coding Agents.

TL;DR

Goal: A Minimal Effective Approval Architecture

Prerequisites

Why Most Developers Are Running Zero Effective Gates

Gate 1: The Plan Review Gate

What is a plan review gate?

When should you trigger a plan review gate?

Implementation: CLAUDE.md prompt pattern

Implementation: SDK-level gate with canUseTool

Real-world example: The CORE system

Gate 2: The Findings Review Gate

What is a findings review gate?

When should you trigger a findings review gate?

Implementation: CLAUDE.md prompt pattern

How to verify the findings gate is doing its job

Gate 3: The Diff-Before-Push Gate

What is a diff-before-push gate?

When should you trigger a diff-before-push gate?

Implementation: Shell function

What to look for in the diff

How to Assemble All Three Gates Into a Pipeline

Complete CLAUDE.md template

Verification: Is Your Gate Architecture Actually Working?

Troubleshooting: Common Gate Failures

How Grass Makes This Workflow Better

FAQ

Next Steps

Implementation: SDK-level gate with `canUseTool`