agent-oversight

How to Review AI-Generated Code That Ships Faster Than You Can Read

AI agents write code faster than you can read it. Here's the four-checkpoint workflow — scope bounds, approval gates, diff review, test verification — that keeps you genuinely in control without killing the speed.

Sahil Kathpal

24 Apr 2026 • 13 min read

AI coding agents like Claude Code, Codex, and Open Code generate code faster than any developer can review line by line — and that speed gap is where real risk lives. The practical solution isn't to review less; it's to review at the right moments. A four-checkpoint workflow — scope bounding before the run, approval gates during the run, a diff gate after the run, and test verification before merging — keeps you genuinely in control without turning review into a bottleneck.

TL;DR

Stop trying to read every line an AI agent writes. Use four checkpoints instead: (1) constrain what the agent can touch before it starts, (2) use the approve-with-comments gate to intercept high-impact operations mid-run, (3) run git diff HEAD after every session to see exactly what changed, and (4) verify your tests pass before you merge. Each step takes under two minutes. Together they close the trust gap completely.

Why Line-by-Line Review Breaks Down with AI Coding Agents

A live r/ClaudeCode thread asking "are you reviewing Claude's code or just trusting it?" surfaced the problem bluntly: developers are openly uncertain how to handle output they can't fully read before it ships. The same week, a thread asking "how are you folks doing code review now?" drew dozens of responses with no settled consensus — a community working out the problem in real time.

The core tension is real. Traditional line-by-line review is impractical when an agent writes 400 lines in five minutes. But blind trust is genuinely dangerous. As one developer in that thread put it: "a risk exists when a user trusts the output without a detailed investigation." This isn't hypothetical: AI-generated code introduces measurably more bugs and technical debt than human-authored code when review gates are absent — not because the models are bad, but because developers skip steps they'd never skip on a human engineer's PR.

The workflow below solves this without making review a bottleneck.

What You'll Accomplish

By the end of this guide, you'll have a repeatable four-step review workflow that covers the full lifecycle of any AI coding agent session: before the run, during the run, after the run, and before merge. The workflow works with any agent — Claude Code, Codex, Open Code — and requires no special tooling beyond git and a test suite. You'll never need to wonder "what did the agent actually touch?" again.

Prerequisites

Claude Code, Codex, or Open Code installed and authenticated in a project
Git initialized in the project (git init if not already done)
A test suite or test framework in place — or you're writing tests as part of Step 4
Recommended: Grass for mobile approval forwarding and async diff review when you're away from your laptop (not required for the core workflow)

Step 1: Bound Scope Before the Run

The highest-leverage thing you can do to make AI-generated code reviewable is to constrain what the agent is allowed to touch before it starts. When an agent receives a vague directive — "improve the auth module" — it may refactor functions you didn't ask to change, add dependencies, or reorganize files. These out-of-scope changes are the hardest to catch in review, and they compound silently across sessions.

Before every agent session, add a scope directive to your prompt:

Task: Refactor `validateToken` in src/auth/token.ts to handle expired tokens gracefully.

Scope:
- MAY edit: src/auth/token.ts, src/auth/token.test.ts
- MAY NOT edit: any file outside src/auth/, package.json, tsconfig.json
- Do NOT add new dependencies
- Do NOT rename or remove existing exports

This isn't just documentation — it gives the agent explicit rules and gives you an unambiguous checklist for diff review. If the diff shows edits outside the declared scope, that's an immediate flag.

For persistent enforcement across sessions, add a scope policy to a CLAUDE.md file in your project root. Claude Code reads this file as context at startup:

## Agent Scope Policy

Do not edit files outside the directory explicitly named in the task prompt.
Do not add or remove dependencies unless the task explicitly includes them.
Do not rename or remove existing exports without explicit instruction.

A community-built "meta-cognition" hook takes this further: it intercepts high-impact mutations and forces the agent to reason through the blast radius before executing. For critical codepaths, that structured pause is worth the latency.

Step 2: Use the Approve-with-Comments Loop During the Run

An approval gate (also called a permission gate) is a point in an AI coding agent's task where it pauses and waits for confirmation before executing a tool call — a file write, a bash command, a file deletion. Claude Code's default permission mode presents each of these as an explicit approval request before execution.

This is the mechanism behind what developers call the approve-with-comments loop: you see the exact operation the agent wants to perform, and you can approve it, deny it, or approve it with a comment that redirects the agent mid-task without aborting the session. A developer migrating away from another tool cited this loop explicitly as a dealbreaker: "this workflow guarantees me being in the loop, fully understanding the changes, spotting issues early."

The comment mechanism is underused. Approving a file write with the comment "use the existing parseDate utility instead of writing a new one" steers the agent without breaking its context. This is faster than denying, explaining, and re-prompting.

What to watch for at each approval gate:

Tool call type	Red flags to act on
File write / edit	Path is outside the declared scope
Bash command	Package installs, git commits, network calls you didn't ask for
File deletion	Any deletion not explicitly requested
Directory operations	Reorganizing files or creating new directories outside scope

Avoid running with --dangerously-skip-permissions unless you've explicitly pre-reviewed the task and are confident the scope is fully constrained. Skipping permissions removes your only in-flight intervention point — after that, you're back to post-hoc diff review as your only gate.

For a detailed breakdown of how Claude Code's permission modes work and how to configure auto-approval for low-risk tool types, see Claude Code Keeps Asking for Permission — How to Handle It.

Step 3: Run a Diff Gate After Every Session

After the agent run completes, run git diff HEAD before doing anything else. The diff gate — a mandatory review of everything the agent changed — is your structured checkpoint between "agent wrote code" and "code exists in my branch."

git diff HEAD                  # full diff of all changes
git diff HEAD --stat           # file-level summary first — read this before the full diff
git diff HEAD -- src/auth/     # scoped to a specific directory
git diff HEAD --word-diff      # word-level diff for small targeted changes

The goal at this stage isn't to read every line — it's to answer four questions in under two minutes:

Scope compliance: Did the agent edit only the files in the declared scope?
Structural changes: Any unexpected new files, deleted files, or renamed exports?
Surprising logic: Does anything look materially different from what you expected?
Size check: Is the diff significantly larger than expected? More than 200 lines for a "small fix" is a warning sign.

If the diff shows scope violations, revert the specific files and restart with a tighter scope directive:

git checkout -- src/some/unexpected/file.ts   # revert a specific file
git restore .                                 # revert everything if the session went badly off-track

Building automated quality gates into CI — like a check that fails when the diff touches files outside a declared allowlist — catches scope creep automatically on shared repositories without requiring manual review of every session.

Step 4: Verify with Tests Before Merging

Tests are the fastest path to behavioral confidence in AI-generated code. The most reliable pattern is test-first: write or confirm tests exist before the agent run, then verify they pass after. This turns the test suite from a post-hoc checker into a specification the agent wrote code against.

# Before the run: confirm tests exist and pass
npm test -- --testPathPattern=src/auth/token

# Start the agent session...
# Agent run completes.

# After the run: verify tests still pass
npm test -- --testPathPattern=src/auth/token

# Check what tests the agent added or modified
git diff HEAD -- "*.test.*"
git diff HEAD -- "*.spec.*"

# Run the full suite to catch regressions in adjacent modules
npm test

Three patterns that sharpen this step:

Review test changes as carefully as implementation changes. Agents sometimes write tests that verify their own implementation rather than the intended behavior. A test that mocks the function it's testing is not a useful test.

Run the full suite, not just the relevant file. Agents occasionally introduce regressions in adjacent modules that only surface in a full run. A clean targeted test alongside a broken integration test is still a broken build.

Check test coverage for new code. If the agent added a new function or branch, verify there's a test path through it. Untested code from an agent is indistinguishable from untested code from a developer — it's where subtle bugs accumulate. ShiftAsia's complete guide to reviewing AI-generated code covers additional patterns for type checking, linting gates, and security-focused review that complement the test-first approach.

How Do You Know the Workflow Is Working?

The workflow is functioning when:

Your diffs are consistently scoped to the files declared before the run
You're catching issues at the approval gate or diff review stage — not after merge
Test failures after agent runs are rare, and when they happen, they're fast to diagnose
You can answer "what did the agent touch in this session?" without opening git

A useful self-check: after a session, read the diff without any agent context. Would you understand and trust these changes if a junior engineer submitted them in a PR? If yes, the workflow is working. If not, identify which checkpoint the gap slipped through and tighten that step.

Troubleshooting Common Issues

The agent edits files outside the declared scope despite the prompt directive. Move the scope policy to CLAUDE.md in the project root. Agents read this file as persistent context at session start, so the constraint is reinforced without relying on you to include it in every prompt.

The diff is too large to review meaningfully in one session. Break the task into smaller units and ask the agent to commit after each logical sub-task. Review and verify incrementally. A 50-line diff is reviewable in two minutes; a 600-line diff rarely is, even if it's all correct.

Tests pass but the implementation logic still looks wrong. Your test suite has a coverage gap for the specific behavior in question. Add tests that exercise the suspicious code paths, then re-run the agent if needed. Treat test-writing as a specification tool, not just a verification tool.

Approval gates are slowing down long sessions. Configure auto-approval for tool calls that are consistently low-risk in your workflow — file reads and lint runs rarely need manual approval. Reserve manual gates for writes, deletions, and bash commands with side effects. See What is an agent approval gate? for a breakdown of what each gate type actually enforces.

You missed a gate because you weren't at your laptop. If you run unattended sessions, you need a way to handle approval requests asynchronously. The next section covers this.

How Grass Makes This Workflow Better

The four steps above work entirely without Grass — they're complete as described. But there's a practical gap when your agent is running in the background: approval gates block progress until you're at your laptop, and the diff review waits until you sit back down.

Grass solves both without changing the workflow.

Approval forwarding to your phone. When Claude Code or Open Code hits an approval gate, Grass surfaces the request as a native modal on your phone — showing the exact tool name and input, syntax-highlighted if it's a file edit or bash command. You tap Allow or Deny from wherever you are. The session doesn't block while you're away from your desk; you don't miss the gate. This is what makes long background sessions and overnight runs viable without skipping permissions entirely. Full details: How to Approve or Deny a Coding Agent Action from Your Phone.

Mobile diff review. After a session completes, Grass's diff viewer shows git diff HEAD output parsed into per-file views — additions in teal, deletions in red, file status badges for modified, new, deleted, and renamed files. Step 3 of this workflow — the diff gate — runs from your phone during a commute, in a meeting, between calls. You don't need your laptop open to know whether the agent stayed in scope.

Session persistence. Grass runs on an always-on cloud VM. The agent session and its diff are waiting for you whenever you're ready to review, whether that's 20 minutes or 8 hours later. Your laptop sleeping doesn't kill the session or the diff.

To use this with your existing workflow: npm install -g @grass-ai/ide → grass start in your project directory → scan the QR code with the Grass iOS app. Your approval gates forward to your phone immediately; the diff viewer is one tap away after any session. See Getting Started with Grass in 5 Minutes for the complete setup walkthrough.

FAQ

How do I review AI-generated code without reading every line? Use four checkpoints: constrain scope before the run so the agent can't wander, use the approve-with-comments gate to catch high-risk operations during the run, run git diff HEAD --stat after the run to verify file-level scope compliance, and run your test suite to verify behavior. You only need to read lines closely when one of these checkpoints raises a flag.

What is the approve-with-comments loop in Claude Code? It's Claude Code's default permission mode in practice. Before each tool call — file write, bash command, file deletion — the agent pauses and presents the operation as an approval request. You can approve it, deny it, or approve it with a text comment that redirects the agent mid-task without aborting the session. One developer described it as the feature that "guarantees me being in the loop, fully understanding the changes, spotting issues early."

How do I stop Claude Code from editing files outside the task scope? Add a scope directive to your prompt listing which files the agent may and may not touch. For persistent enforcement, write the policy to a CLAUDE.md file in the project root — Claude Code reads this as session context at startup. You can also combine this with PreToolUse hooks that intercept writes to specific paths.

Should I write tests before or after an AI agent session? Before. Tests written before the run act as a specification — the agent writes code against a defined expected behavior. Tests written after the run are post-hoc and can accidentally verify the agent's implementation rather than the intended behavior. Run the full test suite after the run to verify correctness and catch regressions.

When is it safe to skip the diff review step? When three conditions hold simultaneously: the scope was fully constrained to a single file, the complete test suite passes with no failures, and the session was short enough that you watched every approval gate in real time. For any session over 20 minutes or touching more than two files, the diff gate is not optional — it's the only comprehensive view of what actually changed.

Next Steps

The four-step workflow above works for any agent, on any machine, today. To extend it to long sessions, background runs, and review without a laptop:

Set up Grass for mobile approval and diff review: npm install -g @grass-ai/ide → grass start → scan QR → approval gates and diffs are on your phone. Getting Started with Grass in 5 Minutes
Review every file an agent touched from your phone: How to Review Your Agent's Code Changes from Your Phone
Run agents unattended without skipping gates: How to Run Claude Code Unattended

This post is published by Grass — a machine built for AI coding agents that gives your agent a dedicated always-on cloud VM, accessible and controllable from your phone. Works with Claude Code and Open Code.