comparison

Cursor vs. Claude Code vs. Codex in 2026: An Honest Breakdown

You're paying for Cursor, Claude Code, and Codex — and the comparison math isn't obvious. Here's where each tool actually wins, what the token economics look like at heavy usage, and what to cut.

Sahil Kathpal

03 May 2026 • 10 min read

TL;DR: Microsoft canceled Claude Code licenses for thousands of engineers because costs hit $500–$2,000 per engineer per month — not because the tool failed, but because 84–95% of engineers used it heavily and token billing doesn't behave like software licensing. GitHub Copilot CLI went GA in February 2026 and now runs Claude Opus 4.6 and Sonnet 4.6 natively, fundamentally changing the three-way comparison. For enterprise teams over 100 engineers, Copilot CLI gives you Claude-model access through Microsoft's governance infrastructure. For heavy individual users, Claude Code direct remains the most capable option — if you actively manage token spend. For batch-heavy, cost-sensitive workloads, Codex offers claimed 2–3x better token efficiency.

The Cost Crisis That Changed Every Comparison

In December 2025, Microsoft gave approximately 12,000 engineers access to Claude Code. By June 2026, those licenses were canceled — not because engineering hated the tool, but because finance couldn't sustain the cost. Internal usage rates ran at 84–95% monthly. Token billing at that engagement level produced invoices of $500 to $2,000 per engineer per month — numbers that don't fit into any standard software procurement model.

Microsoft is not alone. Uber's CTO Praveen Neppalli Naga told The Information in April 2026: "I'm back to the drawing board, because the budget I thought I would need is blown away already." Uber's engineering team burned through its entire 2026 AI coding budget in four months.

The structural problem is not the tools themselves — it's how agentic coding billing works. Token-based billing for agentic workflows doesn't behave like per-seat licensing. The same engineer running the same tool on the same codebase can generate wildly different invoices depending on task complexity. An autonomous refactoring session over a large repo processes dramatically more tokens than a targeted bug fix. Finance teams cannot model this variance the way they model user counts. As The Next Web put it: "The forecast was wrong because the variable being forecast, token consumption, behaves nothing like the licences and seats that finance teams know how to model. Agentic coding makes the model think a lot."

This context now defines every comparison between Claude Code, GitHub Copilot CLI, and Codex in 2026. The question is no longer just capability — it's which billing architecture your organization can govern.

The Three Options (And What Changed in Each)

What is Claude Code?

Claude Code is Anthropic's terminal-based agentic coding tool. It runs Claude Sonnet 4.6 by default — Opus 4.6, Sonnet 4.5, and Haiku 4.5 are also available — executes directly in your local environment with full file and shell access, and bills directly through Anthropic's API at per-token rates. Consumer plans: Claude Pro at $20/month (rate-limited), Claude Max at $100–200/month (higher limits). Anthropic's published benchmark is $13 per active developer day, translating to $150–250/month for an average user. That average is heavily skewed by light users — the P90 engineer running multi-hour autonomous sessions regularly hits $50–100/day.

What is GitHub Copilot CLI — and Why Does It Now Run Claude?

GitHub Copilot CLI went generally available on February 25, 2026. The change that materially alters the three-way comparison: Copilot CLI now runs Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5 natively as selectable models, accessed through Microsoft's pricing and governance infrastructure rather than Anthropic's.

Microsoft's move away from Claude Code is therefore not a move away from Claude. It is a shift in the vendor relationship — from Anthropic's direct API billing to GitHub's pricing plane, where procurement has audit logs, per-team cost dashboards, SSO, and org-level usage controls. Same models, different governance.

One critical caveat: GitHub is transitioning to usage-based billing starting June 1, 2026. The current flat pricing ($10/month base, $39/month Pro+) is not permanent. Community-reported data shows some Pro+ users projecting bills of $942/month under the new usage model based on identical behavior. The "Copilot is cheaper" assumption has an expiration date, and teams evaluating Copilot CLI should model against their actual usage patterns before committing.

What is OpenAI Codex?

OpenAI Codex CLI is a terminal-based coding agent that runs in a cloud sandbox rather than your local environment. Its key differentiator: OpenAI claims 2–3x better token efficiency per task compared to Claude Code. The mechanism — Codex operates with persistent cached context in a remote sandbox, so it doesn't re-read the full codebase on every query. The trade-off is real-time interactivity: Codex is better suited for batch-style tasks you fire off and wait for completion rather than interactive back-and-forth sessions where you redirect the agent mid-execution. Pricing is usage-based against OpenAI API rates.

What $500–$2,000/Engineer/Month Actually Comes From

The $500–$2,000 figure is real but represents the high-usage tail, not the average. Here is how the billing math works across usage tiers:

Usage Tier	Claude Code (Direct API)	GitHub Copilot CLI	Codex (OpenAI API)
Light — 1–2 hrs/day, targeted fixes	$30–80/mo	$10–39/mo (flat, pre-June)	$20–60/mo
Medium — 3–5 hrs/day, mixed tasks	$100–250/mo	$39–150/mo (usage-based)	$60–180/mo
Heavy — 6+ hrs/day, agentic, large repos	$400–2,000/mo	$150–900/mo+ (post-June)	$150–600/mo
Enterprise avg — 100+ engineers, mixed	$150–400/engineer avg	Copilot Business $19/seat + usage	Custom; lower avg per engineer

Estimates based on published API pricing, Anthropic's $13/active-day benchmark, and community-reported usage. Individual costs vary significantly by task complexity and codebase size.

Why the distribution matters more than the average: Anthropic's $13/active-day figure is an enterprise-wide mean. A single developer running overnight autonomous refactoring sessions contributes dramatically more than that. At 100 engineers where 10 are heavy users, those 10 can consume more budget than the other 90 combined — exactly the dynamic that produced Uber's budget overrun. Teams should model against their actual P90 usage, not the published average.

The Copilot billing transition risk: Teams choosing Copilot CLI specifically for cost predictability face a June 2026 inflection point. One Copilot user in the r/ClaudeAI community reported: "Just checked GitHub's billing preview simulator, currently paying $39/month on Pro+ and happily within my included PRUs. Under the new usage-based billing starting June 1st, the same usage pattern would cost me $942.82/month." The risk profile is changing.

How Do the Capabilities Actually Compare?

How does Claude Code compare to GitHub Copilot CLI on autonomous multi-step tasks?

Claude Code is measurably ahead on autonomous, extended agentic tasks. Benchmark data shows Claude Code at approximately 80.8% on SWE-bench versus Copilot CLI at roughly 72.5%. More practically: Claude Code can maintain coherent context across long sessions, execute complex multi-tool chains, and handle ambiguous high-level instructions without constant hand-holding. Tasks like "refactor this module to async/await and update all downstream tests" run more reliably end-to-end in Claude Code's current agentic implementation.

GitHub Copilot CLI's advantages are in GitHub ecosystem integration — PR review automation, issue-to-code workflows, Copilot Workspace for scoped multi-file edits — and enterprise governance tooling that Claude Code doesn't have. For teams whose primary workflow is PR-centric, those integrations provide real leverage that autonomous session capability doesn't.

How does Codex compare on token efficiency for large-scale tasks?

Codex's claimed 2–3x token efficiency advantage is meaningful for teams running high-volume workloads. Developers who have analyzed Claude Code's heavy token usage know the tool reads entire files by default, which burns tokens rapidly on large codebases. Codex's cached sandbox context sidesteps much of that re-reading overhead. The cost: you lose the real-time interactivity that makes Claude Code's back-and-forth conversational workflow productive for complex reasoning tasks.

Teams that need both efficiency and interactivity often use Claude Code for exploratory and planning sessions, then hand off execution to Codex for large-scale batch operations — a workflow worth modeling against your actual task distribution before committing to a single tool.

Full Comparison Table

Dimension	Claude Code	GitHub Copilot CLI	Codex CLI
Default model	Claude Sonnet 4.6	Claude Sonnet 4.5	GPT-4o / o3 / o4-mini
Also offers	Opus 4.6, Sonnet 4.5, Haiku 4.5	Claude Opus 4.6, Haiku 4.5	Various OpenAI models
Execution environment	Local shell	Local + cloud agentic	Cloud sandbox
Agentic depth	High — multi-tool autonomous sessions	Medium — PR/code-gen focus	Medium — batch task execution
Base pricing	$20/mo (Pro), $100–200/mo (Max)	$10/mo, $39/mo (Pro+)	API usage-based
Heavy user cost (est.)	$500–2,000/mo/engineer	$150–900/mo+ post-June 2026	$150–600/mo
Enterprise governance	Limited	Strong — audit logs, SSO, dashboards	Moderate
Token efficiency	Baseline	Similar to Claude Code	2–3x better per task (claimed)
BYOK option	Yes	No — Microsoft controls billing	Yes
Session persistence	Local disk JSONL transcripts	Managed by GitHub	Cloud sandbox (persistent)
Best for	Heavy individual users, autonomous tasks	Enterprise governance, PR workflows	High-volume, batch, cost-sensitive

What Should Enterprise Teams Actually Do After the Microsoft/Uber Cancellations?

Should I switch from Claude Code to GitHub Copilot CLI?

The honest answer depends on whether your problem is capability or governance. If the problem is that finance can't predict or control costs, Copilot CLI solves that — it routes Claude model access through Microsoft's enterprise procurement controls. If the problem is that the tool doesn't perform well enough, Copilot CLI running Claude Sonnet 4.5 may actually perform slightly below Claude Code running Claude Sonnet 4.6, so the switch doesn't help.

For teams over 100 engineers: the governance infrastructure in Copilot CLI — audit logs, per-team usage dashboards, SSO, organizational controls — is genuinely better than what Claude Code's enterprise offering provides. The transition cost is real (60–90 days of productivity recovery is typical per EPC Group's enterprise consulting data) but pays back within a year at that scale.

For teams under 20 engineers: the governance overhead rarely justifies switching. The more effective response is implementing token usage monitoring and session management discipline within Claude Code. Understanding when to switch between Claude Code and Codex for different task types can keep heavy users in the medium cost tier rather than the high tier.

What about BYOK configurations — do they avoid the cost problem?

Bring-your-own-key (BYOK) configurations — where developers authenticate directly against the Anthropic or OpenAI API rather than through a managed subscription — expose you directly to raw token costs. That sounds worse, but it gives you direct access to per-session usage data and the ability to implement hard cost caps at the infrastructure level. The Microsoft and Uber scenarios involved managed subscriptions with limited per-user visibility into real-time token burn. BYOK setups with active monitoring can surface cost overruns before they become budget crises. Grass, for example, uses BYOK authentication — developers own their API keys, and the platform never proxies them — which means cost visibility lives with the developer, not a vendor intermediary.

Where Does Grass Fit?

Grass is a machine built for AI coding agents — an always-on cloud VM where Claude Code, Codex, and Open Code run as first-class citizens, accessible from your laptop, your phone, or an automation. In the context of the cost and vendor-lock conversation: Grass is agent-agnostic by design, meaning teams aren't coupled to a single agent's billing architecture.

When Microsoft moved from Claude Code to Copilot CLI, teams running agent-agnostic infrastructure could change which agent they dispatched without rebuilding their workflow. Grass's BYOK approach also means that token costs flow directly through the developer's API account, with full visibility — no vendor markup, no managed subscription obscuring actual consumption.

Grass is one option in this landscape, not the solution to the cost problem. The cost problem is solved by active token management, not by switching infrastructure. But for teams already managing multiple agents across Claude Code, Codex, and Open Code, one surface reduces operational overhead without forcing an agent-specific commitment.

Verdict

Choose Claude Code direct if you're a heavy individual user or small team (under 20 engineers) prioritizing autonomous capability over governance, and you're willing to actively monitor token spend. Implement session limits and per-task cost awareness to avoid the Microsoft/Uber pattern.

Choose GitHub Copilot CLI if you're at enterprise scale (100+ engineers), need audit logs and SSO, and your workflow is PR-centric. You get Claude model access through Microsoft's governance infrastructure. Model your costs against the June 2026 usage-based billing transition before committing.

Choose Codex if you're running high-volume, batch-style workloads where token efficiency matters more than real-time interactivity. The claimed 2–3x efficiency advantage is meaningful at scale if your tasks fit the batch execution model.

For all three scenarios: Model AI coding agent costs as metered utilities, not software licenses. Budget against P90 usage, not the published average. The teams that get burned are the ones that sign a procurement deal based on a mean and discover their heaviest users are 10x the mean.

Frequently Asked Questions

Why did Microsoft cancel Claude Code licenses in 2026?

Microsoft canceled Claude Code licenses for approximately 12,000 engineers because costs reached $500–$2,000 per engineer per month, driven by 84–95% monthly usage rates. The core issue was that token-based billing doesn't behave like per-seat software licensing — usage was high, variance was unpredictable, and costs weren't flowing through infrastructure Microsoft controlled. Microsoft directed engineers to GitHub Copilot CLI, which provides access to the same Claude models through Microsoft's own pricing and governance plane.

Is GitHub Copilot CLI cheaper than Claude Code?

In early 2026, GitHub Copilot CLI appeared significantly cheaper at $10–39/month flat. As of June 1, 2026, GitHub is transitioning to usage-based billing. Community-reported projections show some Pro+ users facing projected bills 20x their current flat rate under the new model. Whether Copilot CLI is cheaper than Claude Code now depends on your usage pattern — model against your actual behavior using GitHub's billing preview simulator before concluding it's the cheaper option.

Does GitHub Copilot CLI use Claude models?

Yes. GitHub Copilot CLI, generally available since February 25, 2026, runs Claude Sonnet 4.5 by default and offers Claude Opus 4.6 and Haiku 4.5 as selectable models. Microsoft's transition away from Claude Code is not a transition away from Claude — it's a change in billing infrastructure and governance controls while retaining access to Claude models.

What AI coding agent should I use after Microsoft canceled Claude Code enterprise licenses?

The right answer depends on your scale and use case. For enterprise teams needing governance: GitHub Copilot CLI with Claude models, verified against the new usage-based billing starting June 2026. For heavy individual users wanting maximum agentic capability: Claude Code direct with active token monitoring. For high-volume batch workloads: Codex CLI, which claims better token efficiency for non-interactive tasks. There is no single winner — the Microsoft/Uber situation illustrates that the correct answer is whichever billing architecture your team can actually govern.

How much does Claude Code cost per engineer for enterprise teams?

Anthropic's published benchmark is $13 per active developer day, translating to roughly $150–250/month for an average user. That figure is an enterprise mean across all usage levels. Engineers running multi-hour autonomous agentic sessions regularly hit $50–100/day, which translates to $1,000–2,000/month — the figure behind the Microsoft and Uber budget crises. Enterprise teams should model against P90 usage for their most active developers, not the published average.

Published by Grass — a machine built for AI coding agents. Claude Code, Codex, and Open Code run on Grass's always-on cloud VM, accessible from any surface. codeongrass.com