ai-dev-environments

Why You Keep Hitting Claude Code's Output Limit—And How to Fix It

You're paying €100/month for Claude Max and hitting the output limit every other session. It's not a billing problem — it's how you're structuring your sessions. Here's the five-part workflow fix.

Sahil Kathpal

03 May 2026 • 13 min read

Repeatedly hitting Claude Code's output limit at €100+/month isn't a sign you need a higher tier — it's a signal that your session structure is compounding token waste. Every retry, file read, and iterative fix accumulates in the same context window, and you're re-sending that accumulated weight with every new message. The fix is a workflow design problem: scope your sessions to atomic tasks, enforce context hygiene between steps, and reuse persistent environments instead of rebuilding them from scratch each time.

TL;DR: Claude Code output limits hit hardest during iterative builds because conversation history, tool results, and file reads all accumulate in context and get re-sent with every message. The five-part fix: (1) one session per atomic task, (2) explicit file scoping in prompts, (3) a prompt template that constrains scope and provides exit criteria, (4) mid-session compaction before the context cliff, (5) persistent environments that eliminate cold-start token overhead. Apply all five and most developers cut token consumption significantly without slowing down output.

Why Claude Code's Output Limit Keeps Hitting Mid-Session

As one developer put it in a thread on r/microsaas — paying €100+/month for Claude Max and hitting limits every other session:

"You start a session → iterate → fix → retry → expand context → boom: limit hit. Half the time it feels like you're not even progressing, just feeding the machine."

The underlying mechanism is straightforward. Claude Code's context window holds everything: the system prompt from your CLAUDE.md, every user message, every assistant response, every tool invocation result, and every file read. By the time you're three iterations deep on a tricky bug, you're re-sending tens of thousands of tokens of conversation history with every new message — just to ask the agent to try one more approach.

Agentic behavior amplifies this dramatically. Developers auditing AI agent activity have found untracked token spend reaching $10K+/month — not from intentional usage, but from 279 agent loops rewriting the same file, each iteration adding to an already-bloated context, each carrying the full accumulated weight of everything before it.

The output limit isn't random. It's the predictable result of unbounded sessions on iterative tasks.

What You'll Need

Claude Code installed and authenticated (claude CLI)
Any Claude subscription tier (the fixes apply equally; the payoff scales with usage)
A CLAUDE.md file in your project root (we'll configure it in steps 1 and 2)
Optional: Daytona account for persistent sandbox environments (Step 5)
Optional: Grass for always-on cloud VM access from any surface

Fix 1: Scope Every Session to One Atomic Task

The single highest-leverage change: treat Claude Code sessions like Git commits — one session, one logical unit of work.

Most token waste happens in multi-task sessions. "Fix the auth bug, then add the email field to the user form, then update the tests." Each task expansion drags the entire prior conversation into the new context. By the third task in a four-task chain, you're re-sending 60,000+ tokens to establish what's already been done — before writing a line of new code.

Define your session boundary before you open Claude Code. The test: can you describe the session goal in a single sentence without an "and"? If not, split it.

Add this contract to your CLAUDE.md:

## Session Contract

Each session handles ONE task. When that task is complete, stop.

Do not:
- Fix bugs unrelated to the current task
- Refactor code outside the task scope
- Read files not needed for the stated goal
- Install packages unless the task explicitly requires it

For the broader workflow discipline this enables — including the plan-before-execute checkpoint that prevents scope creep mid-session — see The CORE Agentic Workflow: Task → Plan Review → Approve → PR.

Fix 2: Enforce Context Hygiene

Context hygiene means actively controlling what enters the session window — not just what you explicitly include, but what Claude reads without being asked.

The biggest culprits:

Undirected file reads: Telling Claude to "look at the codebase and figure out X" causes it to read 20 files when 3 would do. Each read stays in context forever.
Accumulated tool results: Every bash command output, every file read, every error message persists in the window for the remainder of the session.
Implicit exploration: Asking broad early-session questions causes Claude to explore widely and cache that exploration — including everything it didn't need.

Use /clear strategically. After a major subtask completes but before you switch direction, clear the conversation. The files still exist. Your CLAUDE.md still loads. You start the next subtask without carrying tens of thousands of tokens of stale context.

Reinforce hygiene in CLAUDE.md:

## File access rules

Only read files that are explicitly named in the current task description.
Do not read adjacent files "for context" unless explicitly asked.
Do not read test files unless the task involves tests.
Maximum files in context at once: 8.
If you need a file not on the list, ask before reading it.

Fix 3: Write Token-Efficient Prompts

An underspecified prompt forces the agent to explore. An overspecified prompt gives it exactly what it needs. The token cost of exploration is always higher than the token cost of a detailed prompt.

Use this template to open every session:

## Task
[Single sentence — what exactly needs to change and where]

## Files to read
- src/auth/login.ts
- src/auth/types.ts

## Files to avoid
Everything not listed above. Ask before reading anything else.

## Success criteria
- [Specific, verifiable outcome]
- [Test command that should pass]

## Out of scope
- [Explicit list of what NOT to do in this session]

The success criteria line is critical: it gives Claude a verifiable exit condition so it stops when the task is done rather than continuing to "improve" adjacent code. The out-of-scope line prevents the default thorough behavior — without it, Claude will notice and fix related issues, burning tokens on work you didn't ask for.

This constraint matters more as sessions age. The rule-following degradation that emerges past ~15 tool calls means your system prompt constraints weaken mid-session. Architectural constraints via prompt scoping are more reliable than relying on CLAUDE.md alone.

Fix 4: Compact Mid-Session Before You Hit the Cliff

By 40,000–60,000 tokens into a session, context quality degrades. The model starts re-processing things it's already handled, loses track of earlier constraints, and takes longer paths to simple answers. This is the silent efficiency killer — you're paying for tokens that don't move the work forward.

The fix is explicit compaction. Before you hit the limit, ask Claude to summarize what's been accomplished, then clear and continue with the summary as the new foundation.

Use this prompt when a session starts feeling repetitive or when response quality drops:

Before we continue:
1. Write a 150-word summary of what we've accomplished in this session
2. Include the current state of each file we've modified (filename + what changed)
3. List any remaining work with specific next steps

I'll use this summary to start a fresh context.

After Claude responds, copy the summary, run /clear, and paste it as the first message of the new context. You continue the work without the accumulated weight of every prior iteration. The code changes persist — only the conversational overhead is gone.

This step eliminates 60–80% of accumulated context while preserving the semantic value of what was done. Think of it as a checkpoint commit for your session.

Fix 5: Stop Rebuilding Your Environment Every Session

Every time you start a Claude Code session from a cold environment, there's overhead: reading configuration files, understanding project structure, potentially running install commands. This isn't just slow — it's token-expensive. A session that starts with dependency installation and an orientation pass burns 10,000–15,000 tokens before any substantive work begins.

The calculation compounds quickly: five sessions per day at 12,000 tokens of cold-start overhead each is 60,000 tokens daily — a meaningful fraction of your Claude Max allocation spent entirely on setup.

Persistent environments solve this. Instead of a fresh environment per session, you maintain a warm environment with dependencies installed, environment variables set, and project state intact. Each session starts focused on the actual task.

Daytona's sandboxes are purpose-built for this pattern. Daytona provides secure infrastructure for running AI-generated code — persistent, programmatically manageable environments that don't reset between sessions. You provision the environment once, install your stack, and every subsequent Claude Code session starts from that warm state with full filesystem and process continuity.

For a complete setup walkthrough, see How to Set Up Claude Code on Daytona.

How Do You Know It's Working?

Three signals that your token efficiency has improved:

Cost badge trend. Claude Code shows cost per response. In a well-scoped session, individual response costs stay relatively flat. If you see costs climbing steeply mid-session, your context is accumulating faster than it's being consumed productively.

Session completion rate. Efficient sessions reach the stated goal and stop. If you're regularly hitting output limits before the task is done, the task scope was too large for one session. Track how often you complete vs. cap out.

Re-read frequency. Watch how often Claude re-reads the same file in a single session. Each re-read is a token expenditure indicating the prior read didn't stick. Good context hygiene reduces re-reads to near zero within a session.

For auditing what the agent actually consumed after a session completes, the post-run drift audit workflow gives you a concrete checklist for reviewing what was read, written, and run — useful for diagnosing where tokens went in sessions that still hit limits.

Troubleshooting

"I scope sessions tightly but still hit limits on single tasks."

The task itself is too large for one pass. Split it into a plan phase and an execution phase. In the plan phase, prompt Claude to write a detailed step-by-step implementation plan without writing any code. Review and edit the plan. Then start a fresh session with the plan as context and ask for implementation. The planning pass is cheap in tokens; the execution pass starts focused.

"My CLAUDE.md session contract isn't being followed mid-session."

This is the rule-following degradation past ~15 tool calls. Your system prompt constraints weaken as the session extends. The fix is architectural: enforce constraints through prompt structure (explicit file lists, success criteria, out-of-scope sections) rather than CLAUDE.md directives alone. Directives in CLAUDE.md matter most at session start; they degrade under load.

"Compaction loses important state."

If Claude's 150-word summary is missing something critical, the task was underspecified. A well-scoped task has finite, describable state — if the summary can't capture what's been done in 150 words, the task was too broad. Tighten the session scope and try again.

"I'm paying for Daytona on top of Claude Max — is it worth it?"

Run the calculation: (daily sessions) × (cold-start token cost per session) × (cost per token) × 30 days. For most heavy Claude Code users running 5+ sessions per day, the persistent environment pays for itself in token savings within the first month — before accounting for eliminated setup time.

How Grass Makes This Workflow Better

The five fixes above work without Grass. But they assume you're at your laptop when sessions complete, when limits hit, and when the next session needs to be scoped and dispatched. That assumption breaks constantly.

Grass is a machine built for AI coding agents — an always-on cloud VM where Claude Code, Codex, and Open Code run persistently, accessible from any surface. The cloud VM is powered by Daytona, which means Fix 5 (environment reuse) becomes the default behavior rather than something you configure. Your Daytona sandbox is always warm, always ready, and doesn't reset when your laptop closes.

Cold starts drop to near zero. Because Grass maintains a persistent Daytona VM, your project environment is already configured when any new session starts. No reinstalling dependencies, no re-reading boilerplate config. Your token budget goes to actual work immediately.

Hitting a limit doesn't strand you. When you cap out mid-task on a remote VM, the environment stays alive. You scope the next session from wherever you are, dispatch it, and the agent picks up on a warm environment with the full codebase intact. On a laptop, hitting a limit means either staying at your desk to restart or losing in-progress state.

Dispatch from any surface. Grass's multi-surface access means you can fire off a scoped, well-formed prompt from your commute, between meetings, or when an idea strikes away from your desk. Because you're dispatching to a warm environment, the session starts immediately without setup overhead. The structured prompt templates from Fix 3 work especially well here: compose the prompt with explicit file scoping before you get back to your desk, dispatch it, and check progress when you arrive.

Agent-agnostic by design. Claude Code, Codex, and Open Code all run as first-class citizens on the same VM. If you're hitting Claude Code limits and want to test Codex on the same task — or run both in parallel on different repos — you don't rebuild your workflow. One surface handles every agent.

Grass uses BYOK authentication. Your Anthropic API key stays yours — it never touches Grass's infrastructure.

For a complete setup including Tailscale configuration for secure remote access, see Setting Up Grass with a Daytona Remote Server.

FAQ

Why do I keep hitting Claude Code's output limit even on Claude Max?

Claude Max gives you significantly more usage than Pro, but it still has a finite usage allocation per period. Heavy iterative development burns through it faster than expected because each session accumulates context — conversation history, tool results, file reads — that gets re-sent with every new message. The limit isn't a ceiling you gradually approach; it's a burn rate problem that compounds with unbounded sessions on iterative tasks. Structured sessions that stay under 20,000 tokens each let you run far more total sessions within the same plan quota.

How much context does a typical Claude Code session consume?

A cold-start session with dependency discovery and initial codebase orientation can consume 10,000–20,000 tokens before substantive work begins. A 3-iteration debug loop on a complex bug can burn 40,000–60,000 tokens. Properly scoped sessions using the prompt template from Fix 3 and explicit file lists typically stay under 15,000–20,000 tokens per completed task — a 50–60% reduction over unstructured sessions.

Does using /clear lose my code changes?

No. /clear clears the conversation history — the context Claude is carrying — but doesn't undo any file edits, git commits, or changes made during the session. Your code changes are on disk. What you lose is the agent's conversational memory of why it made those changes, which is why the compaction prompt (summarize before clearing) preserves the useful semantic state before the history is discarded.

What's the difference between Claude Code's context window and its output limit?

The context window (200k tokens for Claude 3+ models) is how much text the model can see in a single request. The output limit on subscription plans is a usage cap — the total tokens generated across all your sessions within a given period. Hitting the output limit doesn't mean your context window is full; it means you've consumed your plan's usage allocation for that period. Both problems are solved by more efficient session design, but they're distinct mechanisms.

Is Daytona the only option for persistent environments?

No. Persistent sessions via tmux on any server — a personal VPS, an EC2 instance, a long-running home machine — achieve the same core benefit of a warm environment. Daytona is purpose-built for this use case: sandboxes that are programmatically manageable and isolated per project. The principle is infrastructure-agnostic; Daytona and Grass remove the configuration overhead so the persistent environment is the default, not a weekend project.

Next Steps

Start with Fix 1 and Fix 3 — they require no tooling changes and deliver immediate impact. Add the CLAUDE.md session contract today, write your first token-scoped prompt template, and track the cost badge trend across your next five sessions. Both of those take under 30 minutes to implement.

If you're running five or more Claude Code sessions per day and spending meaningful time on cold starts or repeated setup, add a persistent Daytona environment. If you want that environment accessible from any surface with zero configuration overhead — dispatching scoped sessions from your phone, reviewing diffs between meetings, handling permission gates without being at your desk — Grass gives you an always-on cloud VM with Claude Code pre-loaded. Free tier includes 10 hours with no credit card required.

One surface. Every agent. Always on.