agent-oversight

When Should Your Agent Ask Before Acting? A 3-Tier Risk Framework

You're choosing between step-by-step approval and full autonomy — but that's the wrong binary. Here's the 3-tier risk framework that matches oversight to operation blast radius, not agent preference.

Sahil Kathpal

03 May 2026 • 12 min read

Every developer running AI coding agents eventually hits the same wall: the agent does something destructive without asking, or it interrupts flow by asking for approval on every file read. The debate plays out publicly as a Codex vs. Claude Code argument — Codex keeps you in the loop with per-step TAB acceptance; Claude Code executes autonomously across multiple files and calls. But that's the wrong frame. The real question isn't which agent to choose — it's which operations warrant which level of oversight. The answer is a three-tier risk classification: autonomous for read-only and reversible work, checkpoint-based for feature development, and step-by-step for auth, infrastructure, and any irreversible destructive operation.

TL;DR: Codex's per-step approval model and Claude Code's autonomous execution are both correct — for different operation types. Classify operations by blast radius: Tier 1 (read-only, reversible) → run autonomously; Tier 2 (feature work, non-destructive writes) → checkpoint at plan and diff; Tier 3 (auth, infra, deletes) → step-by-step approval before each action. Match oversight to risk, and you stop choosing between speed and safety.

Why the Codex vs. Claude Code Approval Debate Is Asking the Wrong Question

The Codex vs. Claude Code control philosophy thread shows developers explicitly choosing Codex for production work because per-step human approval keeps a human in the loop at all times. The critique of Claude Code's autonomous mode: multi-file changes can propagate what amounts to "hallucination debt" — a sequence of plausible-looking edits that collectively break something — before any human review happens.

The counter-position, from a thread on what differentiates agents that actually ship real work, is stated plainly: agents that stay inside the approval loop ship real work; agents that operate outside it "attempt anything, fail silently, hand you back something." Neither characterization is wrong. They describe different risk profiles, not different agent quality.

The incidents anchoring this debate have real stakes. In the PocketOS incident, a Claude agent wiped a production database and all backups in 9 seconds — no approval gate on destructive operations. Separately, a developer reported their agent rewrote their entire auth system overnight without a single checkpoint, breaking 200 user logins. Six hours to undo 40 seconds of agent work. The developer's post-incident conclusion: "Never giving AI write access to auth again, read-only from now on." That's a Tier 3 boundary, drawn the hard way.

The mistake these incidents share isn't using an autonomous agent — it's applying the autonomous model to operations that warranted explicit human approval. The fix isn't switching agents; it's switching approval models for specific operation types.

How the Two Approval Models Work (and What Each Costs)

The Codex model keeps the user as pilot at all times. Every code suggestion requires explicit TAB acceptance before it applies. This creates a tight feedback loop: review, approve, proceed. The cost is velocity — for complex multi-step autonomous tasks, per-suggestion approval defeats the purpose of delegation. Our comparison of Claude Code vs. Codex for heavy users maps this in detail across different workflow types.

The Claude Code model lets the agent execute autonomously across multiple files, calling tools in sequence without pausing. Speed is real. The failure mode is also real: by the time you notice the agent went sideways, it may have touched a dozen files, and unwinding that is nontrivial.

Both are correct design choices for their intended context. The mistake is treating either as a universal default.

Model	Approval granularity	Speed	Safety floor	Best applied to
Codex (step-by-step)	Each suggestion	Low	High	Any operation
Claude Code autonomous	None	High	Low	Read-only / reversible
Checkpoint-based	Plan + diff review	Medium	Medium	Feature work
Configured step-by-step	Per-tool-type	Low	High	Auth, infra, destructive ops

The 3-Tier Risk Classification

The framework has three tiers, each defined by one question: what is the blast radius if this operation goes wrong, and is it reversible?

Tier	Approval model	Blast radius	Reversibility	Example operations
1	Autonomous	Low	Complete	File reads, test runs, linting, doc generation, new file creation
2	Checkpoint	Medium	Git-reversible	Feature code, refactors, API additions, staging migrations
3	Step-by-step	High	Low or none	Auth logic, env vars, production DB, DELETE/DROP, CI/CD config

Tier assignment is operation-specific, not agent-specific. You can run Claude Code in fully autonomous mode for Tier 1 work, checkpoint mode for Tier 2, and step-by-step for Tier 3 — within the same session on the same codebase.

Tier 1: Run Autonomously — Read-Only and Reversible Operations

Tier 1 operations are safe to run without any human in the loop because recovery is trivial if something goes wrong.

Operations that belong here: reading files, running grep/find searches, executing test suites, running linters, generating documentation, browsing directory trees, fetching public URLs. New file creation typically belongs in Tier 1 — a new file can be deleted. The test: if the agent produces garbage output, can you recover with git checkout or rm? If yes, it's Tier 1.

The risk of over-gating Tier 1 work is real. Requiring human approval on every cat and ls command adds friction without adding safety. Worse, approval prompt fatigue sets in — developers start reflexively approving everything, including the Tier 3 operations that actually warrant scrutiny. This is the failure mode of applying the Codex model universally.

For Claude Code, Tier 1 sessions can use --permission-mode bypassPermissions scoped to a read-only task, or a settings.json tool allowlist that auto-approves Read, LS, Glob, and Grep without prompting.

Tier 2: Checkpoint-Based — Feature Development and Non-Destructive Changes

Tier 2 covers the bulk of normal agent work: writing new features, refactoring existing code, adding API endpoints, running database migrations in staging, modifying test suites. These operations have meaningful blast radius — a bad refactor can cascade across dependent modules — but they're reversible via git. The blast radius is bounded by version control.

The checkpoint model applies two human decision points: one at the plan (before the agent touches any files) and one at the diff (before you merge or push). The CORE agentic workflow covers this two-checkpoint pattern in detail. The key insight: you're not reviewing every tool call — you're reviewing intent and outcome, which is where human judgment actually adds value.

For Claude Code: run the agent with --mode plan first, review the generated plan, then re-run with --mode build to execute. Gate the final output at git diff HEAD before pushing.

The operational friction with Tier 2 checkpoints is timing. If the plan checkpoint surfaces while you're away from your desk, the agent stalls — or you skip the review. Both outcomes undermine the model. This is addressed in more detail in the Grass section below.

Tier 3: Step-by-Step — Auth, Infrastructure, and Irreversible Operations

Tier 3 operations warrant per-step human approval because they're irreversible, their blast radius extends beyond your codebase, or both.

Operations that belong here:

Any modification to authentication or authorization logic
Environment variable changes and secrets management
Database schema changes on production
DELETE, DROP, or TRUNCATE statements
Infrastructure-as-code modifications (Terraform, Pulumi, CloudFormation)
CI/CD pipeline configuration changes
Dependency additions that expand the security surface

The PocketOS incident is a textbook Tier 3 failure: a Claude agent with database credentials and no approval gate on destructive operations wiped a production database and all backups in 9 seconds. The agent executed correctly against its instructions — the problem was that a human never explicitly approved a Tier 3 operation. The operation was irreversible.

The community has synthesized a broader guardrails framework from incidents like this: snapshot before sessions, pause before irreversible operations, apply principle of least privilege. That last point matters for Tier 3: approval gates are a process control, not a permissions control. Defense-in-depth means both — require explicit approval and restrict credentials to the minimum scope needed for the task.

For Tier 3, the Codex approval model is structurally correct. The question is whether step-by-step approval requires you to be physically present at a terminal.

The Operation Classification Decision Matrix

Use this table to assign tier before starting any agent session. When uncertain, default to the next tier up.

Operation	Tier	Approval model
Read files, list directories	1	Autonomous
Run test suite	1	Autonomous
Run linter	1	Autonomous
Generate or update documentation	1	Autonomous
Create new files	1	Autonomous
Refactor existing module	2	Checkpoint
Add new API endpoint	2	Checkpoint
Staging database migration	2	Checkpoint
Modify non-auth business logic	2	Checkpoint
Modify authentication or authorization logic	3	Step-by-step
Change environment variables	3	Step-by-step
Production database schema change	3	Step-by-step
Any DELETE / DROP / TRUNCATE statement	3	Step-by-step
CI/CD pipeline configuration	3	Step-by-step
Add dependency with elevated permissions	3	Step-by-step

Configuring Claude Code for Each Tier

The implementation mechanics of approval gates — PreToolUse hooks, ThumbGate blocklists, and permission mode configuration — are covered in the guide to building human-in-the-loop approval gates. That post covers the how; this one covers the which operations and at what granularity. At the configuration level, the mapping looks like this:

Tier 1: Configure a settings.json tool allowlist that auto-approves Read, LS, Glob, and Grep without prompting. Or use --permission-mode bypassPermissions scoped to a read-only session.

Tier 2: Use Claude Code's plan/build mode split — --mode plan to generate a plan for review, then --mode build to execute after approval. Review git diff HEAD before merging.

Tier 3: Leave the default permission mode active. Configure a PreToolUse hook or a blocklist to require explicit approval on any tool matching Tier 3 patterns — bash commands containing delete or drop, file writes to auth-adjacent paths, env var modifications.

One important caveat from the analysis of PreToolUse hook bypass patterns: hooks can be bypassed in certain configurations. For Tier 3 operations, treat approval gates as one layer of a defense-in-depth stack — not the only layer. Least-privilege credentials are the second layer.

How Grass Makes Tier-2 Checkpoints Practical

The primary operational friction with the checkpoint model is presence: you have to be somewhere responsive when the checkpoint fires. If a Tier 2 plan checkpoint surfaces during a meeting or a commute, the agent stalls — or you skip the review. Both outcomes break the model.

Grass solves this with mobile permission forwarding. When your agent hits a permission request — whether a Tier 2 plan checkpoint or a Tier 3 per-step approval — the request surfaces immediately on your phone as a native modal. The modal shows the tool name, a syntax-highlighted preview of the command or file change that would execute, and two buttons: Allow and Deny. One tap, haptic confirmation, the agent proceeds.

Setup takes under two minutes:

npm install -g @grass-ai/ide
cd ~/your-project
grass start

Scan the QR code with the Grass mobile app. Any permission request from Claude Code or OpenCode running in that session routes to your phone instead of blocking at the terminal.

For Tier 3 operations, this changes the operational calculus significantly. Step-by-step approval no longer requires physical presence at a terminal. An agent modifying a staging database schema pauses at each migration step, forwards the ALTER TABLE statement to your phone for review, and proceeds only after you tap Allow — from wherever you are.

For Tier 2 checkpoint workflows, the Grass diff viewer lets you review the full git diff HEAD output on your phone before approving the completion checkpoint. Every file touched, color-coded additions and deletions, before the agent's changes land in your branch.

Grass also runs agents on an always-on cloud VM, which means a Tier 2 task that runs for two hours doesn't die when your laptop sleeps mid-session. The checkpoint surfaces on your phone when the work is done — not when your laptop comes back online.

Try it free at codeongrass.com — 10 hours, no credit card required.

The Verdict

The Codex vs. Claude Code debate is a useful proxy for surfacing the real question, but using it as a binary agent-selection decision misses the underlying framework:

Tier 1 work — Claude Code autonomous mode is appropriate. Blast radius is low; speed gain is real.
Tier 2 work — Checkpoint model. Approve the plan, review the diff. Two human decision points, not per-operation overhead.
Tier 3 work — Codex's per-step approval model, or Claude Code configured with step-by-step gates. The blast radius justifies the overhead.

Agents that ship real work stay inside the approval loop — but "inside the approval loop" should mean the right loop for the right operation, not the same loop for everything.

FAQ

When should my AI coding agent ask for approval before acting?

An agent should ask before any operation with high blast radius or low reversibility. Read-only and easily-reversible operations (file reads, test runs, linting) can run autonomously. Feature work that's reversible via git warrants checkpoint approval — once at the plan, once at the diff. Auth logic, infrastructure changes, production database operations, and any irreversible destructive action require step-by-step approval before each individual operation.

What is the difference between Codex and Claude Code approval models?

Codex keeps the user as pilot at all times — every code suggestion requires explicit TAB acceptance. Claude Code's default mode runs autonomously across multiple files and tool calls without pausing. Neither is universally correct: Codex's model is appropriate for high-risk Tier 3 operations; Claude Code's autonomous mode is appropriate for low-risk Tier 1 read-only work. The right choice depends on what the agent is doing in the session, not which agent you prefer.

What operations should never be run autonomously by an AI coding agent?

Tier 3 operations should always require step-by-step approval: modifications to authentication or authorization logic, environment variable and secrets changes, database schema changes on production, any DELETE/DROP/TRUNCATE statements, CI/CD pipeline configuration, and infrastructure-as-code modifications. These are either irreversible or have blast radius beyond your local codebase.

How is this different from the post on building human-in-the-loop approval gates?

The implementation post covers mechanics: how to configure PreToolUse hooks, ThumbGate blocklists, and mobile approval forwarding. This post covers the prior strategic question: which operations should be gated at all, and at what granularity. Read this framework first to decide what to build; read the implementation post to build it.

Why do agents inside the approval loop ship real work while autonomous agents often fail silently?

The approval loop is also a steering channel. When you approve or deny an agent action mid-session, you provide real-time feedback that keeps the agent aligned with your actual intent. An autonomous agent that can't receive corrections during execution "attempts anything, fails silently, and hands you back something" — there's no mechanism for the human to course-correct before the task completes. The loop isn't just a safety gate; it's how humans maintain effective control over a long-running task without reviewing every tool call. For the operational setup that makes this practical without keeping you at your desk, see how to approve or deny a coding agent action from your phone.