agent-oversight

Why Your Claude Agent Ignores Rules Past ~15 Tool Calls

Your Claude agent follows its system prompt for the first dozen tool calls. Then it stops — no error, no warning, the constraint still present, the model no longer honoring it. Here's what's happening and how to fix it architecturally.

Sahil Kathpal

27 Apr 2026 • 11 min read

System prompt constraints in Claude agents measurably degrade at high context depth. JSON format requirements get ignored. Autonomy rules get bypassed. Approval gate logic becomes unreliable. And there is no visible failure signal when any of this happens — the instruction is still in your system prompt, word for word, and the model just stopped following it. An API-layer proxy that enforces behavioral rules before the model sees the request is the architectural fix, and a 700+ star open-source tool already implements this pattern.

TL;DR: Claude agents reliably break system prompt constraints past ~15 tool calls due to attention dilution at high context depth. Prompt-level fixes (stronger wording, more emphasis) don't solve it because they're subject to the same attention dynamics. The durable fix is API-layer proxy enforcement — rules evaluated outside the model's context window, applied before the request reaches the model and validated before the response reaches your application.

What Is Constraint Degradation in AI Coding Agents?

Most developers notice this as a vague unreliability problem. The agent was following its output format rules, then it wasn't. It was respecting its autonomy boundary, then it stopped. Nothing in the logs flagged it.

Two failure modes appear consistently:

Format violations. An agent required to return structured JSON outputs starts returning plain text at high context depth. The JSON requirement is still in the system prompt — but context pressure has eroded its effective weight.

Behavioral rule bypass. An agent instructed not to take certain actions autonomously (e.g., "always confirm before running destructive commands") starts skipping confirmation past a certain session length. The rule is unchanged. The agent stops treating it as binding.

As one developer documented in a thread measuring Claude's behavior across long agent sessions, constraint adherence measurably weakens at high token depth. The degradation isn't random — it's reproducible and tied to how deep the context window has grown.

Why does this happen? (The root cause)

The model attends to all tokens in the context, but attention weights are not uniform. They're shaped by position, recency, and the density of signal around a token.

A system prompt instruction written once at the start of the context competes for attention against an increasingly large sequence of tool calls, results, and assistant responses. At low context depth, the instruction is relatively prominent. At high context depth — around 15 or more tool calls — the context window is dominated by tool call history, and the effective attention weight of the original system prompt constraint drops.

Anthropic's own engineering guidance on effective context engineering for AI agents describes how agents assemble understanding layer by layer, maintaining only what's necessary in working memory. Declarative rules written once at the top of the context are not continuously re-attended to. They lose salience as the conversation grows.

This is also identified in architectural research on AI agent harnesses as one of the core structural weaknesses in agent systems: the assumption that a system prompt instruction declared once will be honored throughout an arbitrarily long session is architecturally fragile. Context window management — specifically, how constraints survive context growth — is a first-class design decision.

The reproducible test

You can verify this yourself:

SYSTEM_PROMPT = """
You are a coding agent. You MUST always respond in valid JSON.
Never respond with plain text. Every response must be a JSON object
with keys: status, files_modified, and summary.
"""

# Run your agent through 15+ tool calls:
# file reads, searches, bash commands, edit operations
# Then prompt the agent with a question that invites a conversational answer

# What you will observe:
# Tool calls 1-10: consistent JSON compliance
# Tool calls 11-15: degrading compliance, occasional plain text
# Tool calls 15+: consistent format violations

The same test works for behavioral constraints. Set an autonomy rule ("always confirm before running any shell command that modifies the filesystem"), run 15+ tool calls, then trigger the constrained action. Past a certain context depth, the confirmation step disappears.

The threshold isn't exact — it scales with the verbosity of each tool call result. A session with terse tool outputs may hold constraints longer; one with verbose bash output may degrade faster. But the direction and pattern are reproducible.

Why "stronger instructions" doesn't fix this

The intuitive response is to write more emphatic instructions:

SYSTEM: You MUST ALWAYS respond in JSON. THIS IS CRITICAL.
DO NOT IGNORE THIS UNDER ANY CIRCUMSTANCES. ALWAYS JSON.

This doesn't work. You're still writing a single instruction that competes against the same growing context. Emphasis and capitalization don't change attention mechanics. You're applying a content solution to a structural problem.

Adding more constraints to the system prompt doesn't help either — it adds more text for the model to deprioritize as context grows.

What actually survives context pressure: examples over rules

One consistent finding from the community research: examples beat rules at high context depth. A concrete input/output pair demonstrating the desired behavior survives attention dilution better than a declarative rule.

# Weaker (declarative rule — degrades under context pressure):
SYSTEM = "Always respond in JSON format."

# Stronger (example — more resistant to context pressure):
SYSTEM = """
Always respond in JSON format.

Example of a CORRECT response:
{"status": "complete", "files_modified": ["src/auth.py"], "summary": "Added token validation"}

Example of an INCORRECT response (never do this):
I've completed the task. I modified src/auth.py to add token validation.
"""

Examples are semantically richer signal. They encode both the rule and its expected output pattern, giving the model more to attend to even as the declarative instruction loses weight. Including examples measurably extends how long constraints hold — but it does not provide a hard enforcement guarantee. It's a mitigation, not a fix.

The architectural fix: API-layer proxy enforcement

The root issue is that prompt-level constraints are enforced by the model — which means they're subject to the same attention dynamics that cause degradation. A guarantee that holds regardless of context depth requires enforcement that happens before the model sees the request.

That's what an API-layer proxy does. Instead of your agent calling the Claude API directly, requests route through a proxy that intercepts them, applies behavioral rules, and validates outputs before they reach your application:

Agent → Proxy (rule enforcement) → Claude API
              ↑
        Rules defined in plain markdown
        Applied before model sees request
        Output validated before returned to app

The rules live outside the model's context window entirely. They're evaluated by the proxy's enforcement logic, not by the model's attention over the conversation. Context depth becomes irrelevant to rule compliance.

A community-shared open-source proxy for Claude agent rule enforcement has accumulated 700+ GitHub stars — a strong adoption signal indicating practitioners are actively solving this problem. The design principle: rules are defined in plain markdown and enforced before the model ever sees the request.

What API-layer enforcement looks like in implementation

A proxy-based enforcement layer intercepts at two points:

Pre-request: Inject constraints as fresh, recency-boosted context before sending to the model. This doesn't eliminate context pressure, but it ensures constraints appear adjacent to the model's response rather than buried at the start of a long context.

Post-response validation: Validate the model's output before returning it to the application. Non-compliant responses trigger a retry with an explicit correction, or raise an exception for your application to handle.

import json
from anthropic import Anthropic

class AgentProxy:
    def __init__(self, rules: dict):
        self.rules = rules
        self.client = Anthropic()

    def call(self, messages: list, system: str, **kwargs) -> dict:
        # Pre-request: inject rules as the most recent message
        augmented = messages + [
            {
                "role": "user",
                "content": f"[Constraint reminder: {self.rules['reminder']}]"
            }
        ]

        response = self.client.messages.create(
            system=system,
            messages=augmented,
            **kwargs
        )

        # Post-response: validate against rules
        content = response.content[0].text
        if self.rules.get("require_json"):
            try:
                json.loads(content)
            except json.JSONDecodeError:
                raise ConstraintViolation(
                    f"Response violated JSON requirement: {content[:200]}"
                )

        return response

# Usage:
proxy = AgentProxy(rules={
    "reminder": "Respond only in valid JSON with keys: status, files_modified, summary.",
    "require_json": True
})

response = proxy.call(messages=session_messages, system=SYSTEM_PROMPT, model="claude-opus-4-7", max_tokens=4096)

The key design principle: rules live in a configuration object outside the conversation, injected fresh at every request rather than declared once at the start of the context. Retry logic in the proxy means constraint violations never silently reach your application.

Verification: confirming your proxy is working

After implementing proxy enforcement, run the same reproducible test that revealed the degradation:

Start a long agent session with 20+ tool calls
Apply the same behavioral constraint — JSON-only output, or an autonomy boundary
Monitor compliance status across tool call depth

Without proxy enforcement: violations start appearing around tool call 15 and become consistent past 20. With proxy enforcement: compliance rate should be constant regardless of session length, because validation happens outside the model's attention dynamics.

A more rigorous verification: log each response's compliance status alongside context token count. Plot compliance vs. token depth. Without proxy enforcement, you'll see a degradation curve. With proxy enforcement, the curve flattens.

How to Enforce Constraints Without Implementing a Full Proxy

If implementing a full proxy isn't immediately feasible, there's a meaningful partial mitigation: repeat your most critical constraints in the final user message of each turn, not only in the system prompt.

def build_message(user_request: str, active_constraints: list[str]) -> dict:
    reminder = " ".join(active_constraints)
    return {
        "role": "user",
        "content": f"{user_request}\n\n[Active constraints: {reminder}]"
    }

# Instead of:
messages.append({"role": "user", "content": user_request})

# Use:
messages.append(build_message(
    user_request,
    active_constraints=["Respond in JSON only.", "Confirm before any destructive file operation."]
))

The final user message is adjacent to the model's response in the context window, which means it receives higher effective attention than instructions written at the start of a long session. This measurably improves constraint adherence for time-sensitive rules — including approval gate logic, where the constraint needs to hold precisely at the moment the agent would otherwise act.

This approach is closely related to the patterns covered in building reliable human-in-the-loop approval gates for AI coding agents, where the reliability of gate-triggering constraints is the difference between an agent that pauses for approval and one that silently proceeds.

The broader architectural lesson

Constraint degradation is a specific instance of a more general design error: treating prompt-level instructions and API-layer enforcement as equivalent. They're not. They have fundamentally different reliability guarantees.

Securing AI agents against systems you can't fully control requires accepting that behavioral guarantees can't rest entirely inside the model's attention. The attention-based architecture that makes large language models capable is the same architecture that makes prompt-level constraints unreliable over long sessions.

The most robust production agent systems separate three concerns: what the model is asked to do (prompts and instructions), what it's allowed to do (tool permissions and sandboxing), and what it must do (enforced constraints that don't degrade). As covered in the permission layer architecture for production agents, treating all three as "stuff in the system prompt" is the architectural assumption that constraint degradation exposes.

Proxy enforcement handles the third category. It moves behavioral guarantees out of the model's context window and into infrastructure that doesn't degrade as context grows. The 700+ star adoption signal on the open-source proxy tool is practitioners reaching the same architectural conclusion independently.

If your current agent architecture puts all three layers in the system prompt, you already have the degradation problem. You just may not have observed it yet — because you haven't run sessions long enough, or because your approval gate has never fired on the wrong side of the 15-tool-call threshold.

FAQ

Why does my Claude agent stop following its system prompt after many tool calls?

System prompt instructions lose effective attention weight as the context window grows. Claude attends to all tokens in context, but attention is not uniform — at high context depth (~15+ tool calls under typical tool verbosity), the growing volume of tool call history dominates the context, and the relative weight of system prompt instructions drops. This is a structural property of attention-based models, not a bug specific to Claude.

How do I enforce JSON output format in a long-running Claude agent?

The most robust approach is proxy-layer validation: route your agent through a middleware layer that validates response format before returning it to your application, retrying or rejecting non-compliant outputs. A meaningful partial mitigation is to repeat the JSON requirement in the final user message of each turn, placing it adjacent to the model's response where it gets higher effective attention than instructions at the start of a long context.

What is the ~15 tool call threshold for constraint degradation?

The number is approximate and scales with tool output verbosity — agents with terse tool outputs may hold constraints longer, while agents with verbose bash or file read output may degrade faster. The threshold is a reproducible benchmark under typical tool verbosity, not a hard limit. Monitor actual compliance rates in your production sessions rather than relying on a fixed number.

Do examples in the system prompt help more than declarative rules?

Yes, measurably. Examples — concrete input/output pairs demonstrating desired behavior — survive context pressure better than declarative rules because they encode richer semantic signal. Including both a rule and an example of correct versus incorrect behavior extends how long constraints hold. But examples provide no hard enforcement guarantee; they're a mitigation that extends constraint lifetime, not a fix that eliminates degradation.

Can Claude Code hooks replace a proxy for constraint enforcement?

Hooks (PreToolUse/PostToolUse) can intercept tool calls but operate differently from a request-level proxy — they run in the agent's process rather than at the API boundary, and they fire on specific tool invocations rather than on every model response. Hooks are useful for tool-specific control and approval gate triggering. A proxy validates every response regardless of which tools were called. The two are complementary: hooks for tool-level enforcement, proxy for response-level validation.

This post is published by Grass — a VM-first compute platform that gives your coding agent a dedicated virtual machine, accessible and controllable from your phone. Works with Claude Code and OpenCode.