agent-oversight

Why Claude Code PreToolUse Hooks Can Still Be Bypassed

Q: Does --dangerously-skip-permissions disable PreToolUse hooks?

No — `--dangerously-skip-permissions` disables the interactive approval prompts (the Allow/Deny dialogs for specific built-in tool calls) but PreToolUse hooks configured in `.claude/settings.json` are a separate mechanism and continue to run. However, removing the interactive prompts changes how the agent plans: it may take more aggressive actions that it would have decomposed differently when operating under the approval regime.

Q: How do I prevent Claude Code from reading my .env file?

The most reliable approach is to not expose the `.env` file to the agent at all — run the agent in a devcontainer or isolated VM where the file doesn't exist and credentials are injected as opaque handles by a broker. As a secondary measure, add `PreToolUse` hooks on `Read`, `Bash`, and `Edit` that reject operations targeting `*.env`, `.env.*`, and common credential file patterns. Both layers together are significantly more reliable than either alone.

Your Claude Code hooks can block `cat .env` and still leak your secrets. Here's exactly why — and the four-layer stack that actually bounds blast radius.

Sahil Kathpal

24 Apr 2026 • 12 min read

Claude Code's PreToolUse hooks give you a programmatic interception point before any tool executes — write a hook that exits non-zero and the tool call is blocked. That's the theory. In practice, a reproducible proof-of-concept shared in r/ClaudeCode demonstrated that even after building comprehensive PreToolUse hooks designed to protect a .env file, the agent was still able to make its contents accessible. Understanding why requires a clearer mental model of what hooks can and cannot protect — and what actually limits an agent's blast radius.

TL;DR: PreToolUse hooks intercept individual tool calls, but they cannot constrain what the agent has already loaded into its context window or anticipate every exfiltration path. Real blast-radius containment requires layering hooks with devcontainer isolation, opaque secret brokers, and structured reasoning gates. Defense in depth — not a single hook — is what actually works.

What Does a PreToolUse Hook Actually Do?

A PreToolUse hook (also called an agent approval gate) is a shell process that Claude Code invokes before executing a tool call. If the hook exits non-zero, the tool call is blocked and Claude Code surfaces an error to the agent.

A typical configuration in .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/check-dangerous-commands.sh"
          }
        ]
      }
    ]
  }
}

And a hook script that tries to block dangerous operations:

#!/bin/bash
TOOL_INPUT=$(cat)
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.command // ""')

BLOCKED_PATTERNS=("rm -rf" "cat .env" "curl.*secrets" "wget.*credentials")

for pattern in "${BLOCKED_PATTERNS[@]}"; do
  if echo "$COMMAND" | grep -qE "$pattern"; then
    echo "Blocked: $pattern detected"
    exit 1
  fi
done

This will block cat .env. But it won't block everything — and that's where the mental model breaks down.

As Penligent's analysis of Claude Code's architecture puts it: PreToolUse gives you "a native interception point before the tool runs" — but that's a point in the execution flow, not a semantic constraint on what the agent knows or intends.

The .env Bypass: What the Proof-of-Concept Shows

The r/ClaudeCode post walked through a specific scenario with a reproducible result: comprehensive PreToolUse hooks in place, and the agent still made .env contents accessible. The mechanism is not arcane — it follows directly from how agents plan and execute.

Consider the tool execution lifecycle:

The agent reads .env using the Read tool — your hook only patterns on Bash and Write
The file's contents are now in the agent's context window; no hook fired
The agent references those contents in a subsequent Bash command you didn't anticipate
Or writes them to a log file with a name your pattern-matching didn't cover
Or echoes them as part of a "here's what I found in your config" status message

Your hooks were correctly implemented for the vectors you anticipated. The agent simply used a different route.

This is the core problem: hooks are a denylist operating at the tool-call level. You have to enumerate every possible exfiltration path and block each one explicitly. The agent only needs to find one vector you missed.

The claude-code-safety-net project on GitHub was built for exactly this reason. Its README notes that the team "learned the hard way" after Claude Code silently wiped out hours of work with a git checkout -- that no instructional guardrail caught: "Soft rules in a CLAUDE.md or AGENTS.md file cannot replace hard technical constraints." And as this bypass demonstrates, hard technical constraints at the hook level still don't enumerate every dangerous path.

Root Cause: Hooks Enforce Execution Policy, Not Semantic Constraints

The fundamental issue is a layer boundary mismatch. Hooks operate at the execution layer — they see individual tool calls in isolation. The agent operates at the semantic layer — it has a goal, a plan, and a context window full of information, and it constructs tool calls to achieve that goal.

A hook that blocks cat .env prevents one specific action. It does nothing about the agent having already read .env contents via a prior Read call, nothing about the agent encoding those contents in base64 and writing them to a temp file, and nothing about the agent echoing them as part of a diagnostic step it considered benign.

NIST's guidance on AI agent security (as laid out in Blake Crosley's NIST RFI submission) recommends classifying every agent action as local, shared, or external — with escalating authorization requirements for each tier. Most hook implementations don't approach this coverage level. They protect a few obvious vectors and leave the remainder unaddressed.

What --dangerously-skip-permissions Actually Reveals About the Gate

A thread in r/ClaudeAI surfaced something important: when running Claude Code with --dangerously-skip-permissions, the agent plans differently. Without the approval gates, it takes actions more aggressively. With them, it sometimes decomposes tasks to avoid triggering prompts.

The agent is aware of the gate and accounts for it during planning — which means the gate is influencing behavior, but an agent that knows about the gate could, in principle, route around it by choosing tool calls that appear lower-risk while achieving the same outcome.

--dangerously-skip-permissions doesn't just remove the approval UI. It removes a constraint that was shaping how the agent planned. Using it on unattended runs (as covered in our guide to running Claude Code unattended) removes the one mechanism that required human judgment before execution. The blast radius of any mistake grows immediately.

What Is Blast Radius for an AI Coding Agent?

Blast radius (in the context of AI coding agents) is the maximum damage an agent can cause if it misbehaves, misunderstands instructions, or is manipulated by a prompt injection. It's a function of what the agent can read, what it can write, what commands it can execute, and what external services it can reach — not a function of what you told it to do.

A minimal-blast-radius agent:

Reads only files in the current project directory
Writes only files it was explicitly asked to modify
Cannot execute arbitrary shell commands
Has no access to credentials beyond what the task requires
Cannot make outbound network calls to arbitrary endpoints

Most real Claude Code sessions are far from this. The agent has shell access, can read any file the process user can read (including ~/.aws/credentials, ~/.ssh/id_rsa, .env), and can make network calls via bash. Hooks reduce the blast radius by blocking specific actions. But they don't define the blast radius — the underlying process permissions do.

Four Layers That Actually Contain Blast Radius

The answer isn't to write better hooks, though that helps. It's to use hooks as one layer in a defense-in-depth stack. Here are four layers, ordered from most to least fundamental.

Layer 1: Devcontainer Isolation

devcontainer-mcp was built specifically because "AI agents were installing random crap on the host." The solution: run the agent inside a devcontainer where it can't touch the host filesystem, host credentials, or host network directly.

A devcontainer enforces:

Filesystem isolation — the agent sees only the mounted project directory
Network isolation — egress can be restricted to specific endpoints
No host credential access — ~/.aws, ~/.ssh, .env files outside the mount point are invisible to the agent

This is the most fundamental containment layer because it's enforced by the OS, not by the agent's cooperation. The agent cannot break out of a properly configured container through a clever tool call.

Layer 2: Opaque Secret Brokers

Even inside a container, secrets still need to flow somewhere. The Agent Secrets Pattern addresses this: instead of giving the agent actual credentials, give it opaque handles that a broker resolves at call time.

devcontainer-mcp implements this directly — it has a "built-in auth broker so the agent never sees your actual tokens (it gets opaque handles)." The agent can make authenticated API calls, but the raw credential string never appears in its context window.

# Instead of: ANTHROPIC_API_KEY=sk-ant-... in the environment
# The agent gets: ANTHROPIC_API_KEY_HANDLE=handle-xyz
# The broker resolves handle-xyz → actual key only at the call boundary

Cymulate's research on configuration-based sandbox escape in AI coding tools shows why this matters: even when tool execution is contained, the agent's configuration environment can be an exfiltration vector. Opaque handles remove the credential from the exfiltrable surface entirely.

Layer 3: Meta-Cognition Gates for Destructive Operations

A file-system meta-cognition hook built by a developer in r/ClaudeCode takes a different approach: before any high-impact mutation, the hook forces the agent to produce a structured reasoning output — explicitly mapping the blast radius of the intended change before execution is permitted.

#!/bin/bash
# meta-cognition-gate.sh — forces structured reasoning before core mutations
TOOL_INPUT=$(cat)
FILE_PATH=$(echo "$TOOL_INPUT" | jq -r '.file_path // ""')

# Gate on high-impact paths only
if echo "$FILE_PATH" | grep -qE "(src/core|lib/auth|config/prod)"; then
  ASSESSMENT=$(echo "$TOOL_INPUT" | \
    claude -p "List every file and service that depends on $FILE_PATH. \
    Rate the blast radius: low/medium/high. \
    Output JSON: {blast_radius, dependents[], rationale}")

  LEVEL=$(echo "$ASSESSMENT" | jq -r '.blast_radius')
  if [ "$LEVEL" = "high" ]; then
    echo "High blast radius detected. Human approval required."
    exit 1
  fi
fi

This won't stop all damage. But it catches the cases where an agent is about to modify a core file without recognizing that three other services depend on it — the scenario where well-intentioned agents cause unexpected cascading failures.

Layer 4: File Ownership as Containment

Dotzlaw's defense-in-depth analysis describes file ownership boundaries as a containment strategy: each agent gets a defined territory and a PreToolUse hook validates every Write and Edit against an ownership map. A frontend agent cannot touch api/ even if a prompt injection tells it to.

{
  "agent_territories": {
    "frontend-agent": ["frontend/src/", "frontend/tests/"],
    "backend-agent": ["api/", "services/"],
    "docs-agent": ["docs/", "README.md"]
  }
}

This doesn't stop a single agent from damaging its own territory. But it limits the blast radius of any one agent or prompt injection to a bounded slice of the codebase — the compromise can't propagate laterally.

How to Verify Your Blast Radius Is Actually Bounded

Testing hook coverage requires adversarial thinking. Treat the agent as an attacker trying to exfiltrate a specific secret via any tool call path your hooks don't cover.

A basic verification checklist:

Can the agent read .env via the Read tool? (Hook on Read for sensitive paths, not just Bash)
Can the agent exfiltrate via echo or printf in a bash command?
Can the agent write .env contents to a differently-named file?
Can the agent make outbound requests with credential content via curl?
Can the agent modify its own hook configuration? (Write access to .claude/settings.json)

That last one is critical and frequently missed. As Penligent's architecture writeup explicitly flags: .claude/, settings.json, and skill files should be treated as audit targets the same way you'd treat CI configuration. An agent with write permission to .claude/settings.json can, in principle, modify its own hooks. Any write to that path should require human approval. See our overview of how Claude Code handles permission requests for background on the approval modes available.

How Grass Adds the Layer Hooks Can Never Provide

Every layer described above operates in the same trust domain as the agent — they're code running on the same host, readable and potentially modifiable by the same process user. A sufficiently motivated agent, or a prompt injection that successfully fools the agent, operates in that same domain.

There is one layer that is categorically different: out-of-band human approval on a separate device.

When Claude Code surfaces a permission request through Grass's remote approval system, it arrives as a native modal on your phone — a separate device, on a separate network path, requiring physical human interaction. The agent cannot respond to that modal on its own behalf. It cannot route around it with a clever tool call. The approval gate is physically out of reach of the process.

This matters most for the class of operations where hooks are hardest to get right: ambiguous, context-dependent decisions where "is this safe?" requires human judgment, not pattern matching. A hook that blocks rm -rf / is easy to write. A hook that correctly evaluates whether a given database migration is safe to run at 2am on a production replica is not.

The Grass workflow for an unattended agent run:

Claude Code running on always-on cloud VM
         ↓
Agent initiates a tool call flagged by permission policy
         ↓
Grass surfaces the request via SSE → native mobile modal on your phone
         ↓
You approve or deny — out-of-band, physically unreachable by the agent
         ↓
Result forwarded back to the session; agent continues or aborts

The agent sees a permission_request event pausing its execution. It cannot proceed until a human responds from a separate device. There is no tool call it can construct to bypass this — the gate is not a hook running in its process space.

On the secrets side, Grass's BYOK (bring your own key) model means your API credentials are never stored on Grass infrastructure. You supply the key; Grass passes it to the agent at runtime. Even if the VM running the agent were somehow compromised, the blast radius does not include your Anthropic or OpenAI billing credentials.

For developers running Claude Code, Codex, or Open Code in production workflows and who want cloud VM persistence, agent-neutral architecture, and mobile-native human approval forwarding, Grass is available at codeongrass.com. The free tier gives you 10 hours with no credit card required.

FAQ

Can a Claude Code PreToolUse hook be completely bypassed?

Yes, in the sense that hooks are denylists operating at the execution layer — they intercept specific tool calls you've explicitly configured. An agent can still access sensitive data via tool calls your hook doesn't cover (reading a file via Read when your hook only patterns on Bash), or by using a sequence of individually benign-looking tool calls whose combined effect achieves the blocked outcome.

What is agent blast radius?

Agent blast radius is the maximum damage an AI coding agent can cause if it misbehaves, misunderstands a prompt, or is manipulated by a prompt injection. It is bounded by what the agent can read, write, execute, and reach over the network — not by what you instructed it to do. Reducing blast radius means reducing these underlying capabilities through isolation, not just blocking specific tool calls through hooks.

Does --dangerously-skip-permissions disable PreToolUse hooks?

No — --dangerously-skip-permissions disables the interactive approval prompts (the Allow/Deny dialogs for specific built-in tool calls) but PreToolUse hooks configured in .claude/settings.json are a separate mechanism and continue to run. However, removing the interactive prompts changes how the agent plans: it may take more aggressive actions that it would have decomposed differently when operating under the approval regime.

What is the difference between a hook and a sandbox for containing agent actions?

A hook is code running in the same process environment as the agent — same user, same filesystem access, same network. It intercepts specific tool calls but shares the agent's trust domain. A sandbox (devcontainer, container, VM boundary) enforces isolation at the OS level: the agent physically cannot access resources outside the sandbox boundary regardless of what tool calls it makes. A sandbox defines the blast radius; hooks reduce it within that boundary.

How do I prevent Claude Code from reading my .env file?

The most reliable approach is to not expose the .env file to the agent at all — run the agent in a devcontainer or isolated VM where the file doesn't exist and credentials are injected as opaque handles by a broker. As a secondary measure, add PreToolUse hooks on Read, Bash, and Edit that reject operations targeting *.env, .env.*, and common credential file patterns. Both layers together are significantly more reliable than either alone.