agent-oversight

Argus vs. Coograph: Real-Time Observability for Claude Code

Your Claude Code agent ran for two hours. Now nobody understands what it built — and it never surfaced a single error. Here's the two-tool observability stack that catches drift before it compounds.

Sahil Kathpal

07 May 2026 • 13 min read

Two new open-source tools address the Claude Code observability gap from opposite ends of the problem: Argus hooks into VSCode and surfaces every tool call as the agent executes; Coograph builds a dependency graph of your repo and constrains what the agent reads before it acts. Used together, they give you a lightweight real-time observability stack that catches drift before it compounds into something you can't easily reverse. This post walks through installing and wiring up both.

TL;DR: Install Argus for VSCode-native action tracking (passive, low-overhead, immediate). Install Coograph for dependency-graph pre-read (active, reduces wrong-context decisions on large codebases). If you run concurrent sessions or need approval-gate visibility away from your desk, Grass's /permissions/events SSE stream fills the remaining gap. All three are free and open-source.

The Failure Pattern That Made Observability Necessary

One developer's account on r/ClaudeCode is now the canonical description of what happens when you delegate a large build with no visibility: "We've been building real SaaS for the past month with Claude Code... Nobody, including me, fully understands what we built." The agent built the product. The agent is also the only entity that understood it — and it's already gone.

A related failure pattern is subtler and more dangerous: agents that execute confidently while making wrong decisions — "System didn't fail loudly, it kept executing incorrectly." No error. No warning. Just confident, silent wrong.

Both failures share a root cause: no visibility into agent behavior at the time decisions are being made. You find out after.

What Is Agent Observability?

Agent observability (for coding agents) is the ability to see, in real time, which tools an agent is calling, which files it's reading, and what context it's operating on — before those actions compound into irreversible state changes.

This is different from post-run auditing, which tells you what happened after the session. Observability is about having signal during execution — close enough to the decision point that you can intervene. As Apiiro's framework for AI agent monitoring notes, addressing these gaps requires blending code, runtime, and AI-level monitoring — post-hoc logs cover only one layer.

The Two Layers of the Observability Gap

Before picking a tool, it helps to be precise about what you're trying to observe:

Layer 1 — Action visibility: What tool calls is the agent making, in what order, against what files?

Layer 2 — Context quality: Is the agent reading the right files to make this decision? Is its working context focused or diluted?

These are different problems with different failure modes. An agent can be fully visible at Layer 1 — you see every tool call in real time — and still be broken at Layer 2 because it's reading 35 files to answer a question about 3, making decisions from the wrong area of the repo.

Argus addresses Layer 1. Coograph addresses Layer 2.

Prerequisites

Before setting up either tool, confirm:

Claude Code installed and authenticated: claude --version
Node.js 18+
Git-initialized project directory
VSCode (for Argus; Coograph is editor-agnostic)

Optional: Grass for mobile approval-gate visibility (covered in its own section below).

How to Set Up Argus: VSCode-Native Action Tracking

Source: github.com/yessGlory17/argus

Argus is a lightweight VSCode extension that hooks into Claude Code and surfaces real-time agent action tracking and tool call visibility inside the editor. You don't leave your editor. You don't switch to a separate dashboard. The tool call log appears in a panel as the agent works.

Install:

Search Argus in the VSCode Marketplace (publisher: yessGlory17), or find the current install link in the GitHub repo. The generic name "argus" may not resolve with code --install-extension — use the marketplace search to get the correct extension ID.

What Argus surfaces per tool call:

Tool name: Read, Write, Edit, Bash, WebSearch
File path or command argument
Call timestamp and duration
Sequential order across the session

What a session looks like in Argus:

[10:24:01] Read    → src/auth/middleware.ts       (212ms)
[10:24:03] Read    → src/auth/session.ts          (88ms)
[10:24:05] Edit    → src/auth/middleware.ts       (active)
[10:24:09] Bash    → npm run test:auth            (pending)

That log is available within 1–2 seconds of each tool call. If the agent reads .env during a task that shouldn't require it, you see it immediately — not after the session ends.

For teams running more comprehensive session tracking, the Claude Code Agent Monitor provides a full-featured alternative: a WebSocket-backed dashboard with SQLite session persistence, Kanban status board, and subagent orchestration tracking. Argus is the lightweight on-ramp; the Agent Monitor is the full-featured dashboard version.

Argus limitations:

Passive observation only — it does not constrain what the agent does
Scoped to a single VSCode session per window
No multi-session aggregation

How to Set Up Coograph: Dependency Graph Pre-Read

Source: github.com/paullukic/coograph

Coograph takes a different approach. Instead of observing what the agent does, it changes what the agent reads before any action. Coograph indexes your repo into a dependency graph and makes it queryable — so the agent asks "which files are relevant to this?" before opening files. This cuts uncontrolled file reads down to the 3–5 files that actually matter for a given query.

Install:

npm install -g coograph

Index your repository:

cd your-project
coograph index

Indexing is a one-time operation with incremental updates as files change. On a medium-sized codebase (500–2,000 files), initial indexing typically completes in under 2 minutes.

Verify the graph:

coograph query "how does the payment flow work?"
# → Returns: src/payments/processor.ts, src/payments/webhook.ts,
#            src/api/checkout.ts, src/models/order.ts

If the output is 3–7 files, the graph is working. If it returns 30+ files, your codebase may need more explicit module boundaries (check for circular dependencies or a flat src/ structure).

Wire Coograph into Claude Code via CLAUDE.md:

Add this to your project's CLAUDE.md file (in the repo root):

## File Reading Protocol

Before reading files to understand any area of this codebase, first run:
  coograph query "<your question about the area>"

Read only the files returned by that query. Do not open files not in that list
unless a file you've already read explicitly imports or references them.

Alternative: system prompt flag:

claude --system "Before reading any files to answer questions about this codebase, run 'coograph query <question>' and limit reads to the returned file list."

What Coograph solves:

Without Coograph, an agent investigating a payment bug might read src/models/user.ts, src/models/order.ts, src/api/checkout.ts, src/payments/processor.ts, src/payments/webhook.ts, src/auth/session.ts, src/utils/logger.ts — and 20 more files from adjacent modules that happen to share similar naming. With Coograph, it queries the graph first and opens only the files the dependency graph identifies as relevant to "payment flow". The agent makes better decisions from focused context than from a diluted mixture of 35 files, many from the wrong subsystem.

This directly addresses the failure mode where the agent "would look at one part and assume the whole thing worked the same way" — applying patterns from one module incorrectly to another because both ended up in the same context window.

Argus vs. Coograph: Direct Comparison

Dimension	Argus	Coograph
When it acts	After each tool call (observation)	Before file reads (constraint)
Observability layer	Layer 1: action visibility	Layer 2: context quality
Editor requirement	VSCode required	Editor-agnostic
Installation	VSCode extension	CLI + npm package
Setup effort	Low — install and start	Medium — index + CLAUDE.md wiring
Runtime overhead	Passive, near-zero	Adds a pre-read query step (~100–300ms)
Prevents bad decisions?	No — shows them as they happen	Partially — limits wrong-context reads
Replayable history?	Session-scoped, in-editor	No — query-time only
Multi-session support	One session per VSCode window	Per-repo shared graph
Best for	Audit trail, unexpected file access	Large codebases, token control

Verdict: These tools are complementary, not competing. Argus gives you a real-time action log with no configuration overhead. Coograph constrains context quality before the agent reads anything. Neither addresses approval-gate visibility or multi-session monitoring — that's the remaining gap.

What Neither Tool Covers: Approval Gates Across Concurrent Sessions

The inside-the-loop vs. outside-the-loop framing matters here. Argus and Coograph are both inside-the-loop tools — they operate within a single active session. The observability gap they don't address:

You're running 3 Claude Code sessions across different repos
Each session generates permission requests: Bash, Write, file edits
You leave your desk

When a session hits a tool call that requires approval and you're not there, the session blocks silently. You come back to three stuck agents, no audit trail of what was pending, and no way to know which unblocked themselves with --dangerously-skip-permissions and which are still waiting.

Anthropic's own research on measuring agent autonomy identifies real-time steering as a core investment — the ability to intervene mid-session, not just observe after. Enterprise monitoring tools like Dynatrace, which recently expanded to cover Claude Code and Gemini agents, confirm the industry is moving toward this model. The gap in the indie/solo developer toolchain is that nothing provides this at low overhead for concurrent sessions.

How Grass Makes This Observability Stack Complete

Grass provides what Argus and Coograph don't: a global permissions event stream and SSE session replay that aggregate across all running sessions and work from any device.

The `/permissions/events` SSE Stream

Grass exposes a global SSE endpoint that surfaces every pending permission request across all active sessions in one stream:

curl http://localhost:32100/permissions/events

Each event payload:

{
  "type": "permissions",
  "permissions": [
    {
      "sessionId": "abc123",
      "agent": "claude-code",
      "repoPath": "/projects/payments-api",
      "repoName": "payments-api",
      "toolUseID": "tool_xyz789",
      "toolName": "Bash",
      "input": { "command": "psql $DATABASE_URL -c 'DROP TABLE sessions;'" }
    }
  ]
}

You can subscribe to this from any client — a CLI watcher, a custom dashboard, or the Grass mobile app — and see every pending approval request across every concurrent session from a single stream. No per-session polling required.

Session Replay via `Last-Event-ID`

Every event in Grass's SSE streams carries a seq field and id: header for ordered replay. If your monitoring client disconnects, reconnect with Last-Event-ID and all missed events are replayed:

# Reconnect and replay from event 42 onward
curl -H "Last-Event-ID: 42" \
  "http://localhost:32100/events?sessionId=abc123"

This gives you a replayable audit trail of everything the agent did in a session — not just what was visible when you happened to be watching.

Mobile Approval Forwarding

When Argus shows the agent touched .env 90 seconds ago, that's after the fact. When Grass routes a pending Bash permission to your phone before the command executes, you still have time to approve or deny it from wherever you are.

The Grass app surfaces each permission request as a native modal with a syntax-highlighted preview of the command or file edit. Allow or Deny — the session doesn't execute until you respond.

Setting Up Grass Alongside Argus and Coograph

npm install -g @grass-ai/ide
cd ~/projects
grass start

Scan the QR code on your phone. From that point, every permission request across all active sessions appears on your phone as they come in, every session stream is replayable, and the diff viewer shows you every file the agent touched — all from mobile.

Grass doesn't replace Argus or Coograph. Argus gives you VSCode-native tool-call visibility during a session. Coograph constrains what the agent reads. Grass handles approval gates and multi-session monitoring when you step away.

For remote sessions or cross-network access, add Tailscale to the setup and use grass start --network tailscale instead.

Verifying the Stack Is Working

Argus: Start a Claude Code session. Tell it to read any file in your project. The Argus panel in VSCode should update within 1–2 seconds showing the Read tool call and the file path. If the panel is empty after 10 seconds, restart the extension with Developer: Reload Extension from the command palette.

Coograph: Run a test query before opening a session:

coograph query "how does authentication work?"

Expected output: 3–7 file paths. If you get 30+, re-index (coograph index) and check that your CLAUDE.md protocol instructs the agent to call Coograph before reads. To verify compliance, check the agent's first tool call after you open a session — it should be a Bash call running coograph query, not a direct Read.

Grass: Check the health endpoint:

curl http://localhost:32100/health
# → { "status": "ok", "cwd": "/projects", "serverVersion": "1.7.0" }

Open a session that will require a Bash command (e.g., ask the agent to run tests). The permission request should appear in the SSE stream before the command executes, and on your phone if the app is connected.

Troubleshooting Common Issues

Argus shows no events after installing:

Confirm Claude Code is running inside the same VSCode workspace, not a separate terminal window
Check the Argus output panel for connection errors (View → Output → Argus)
Start a fresh Claude Code session after installing — the extension hooks session start

Coograph query returns irrelevant or too many files:

Run coograph index again if significant files have been added or reorganized
Use domain-specific terminology from the codebase in your query, not generic descriptions ("payment processor" rather than "payment code")
Check for circular imports — they can cause the graph to over-include files from unrelated modules

Agent ignores the CLAUDE.md Coograph protocol mid-session:

CLAUDE.md is read at session start; changes don't apply to active sessions
For long sessions, remind the agent explicitly: "Before reading files, run coograph query first"
Consider a PreToolUse hook that fires a warning when Read is called without a preceding coograph query in the recent tool history

Grass permission events not reaching mobile:

Confirm phone and server are on the same WiFi network
Check grass start output for the server IP shown in the QR code — that's the address the phone connects to
For cross-network access: grass start --network tailscale requires Tailscale running on both machines

FAQ

What is the difference between Argus and Coograph for Claude Code observability?

Argus is a VSCode extension that logs every agent tool call in real time after it executes — giving you a sequential action log during a session. Coograph is a CLI tool that builds a dependency graph of your repo and constrains which files the agent reads before it acts. Argus is post-call visibility; Coograph is pre-read constraint. They address different layers of the observability problem and are designed to be used together.

How do I see what Claude Code is doing in real time?

Install Argus as a VSCode extension — it surfaces every tool call (reads, writes, bash commands, edits) in a panel as the agent executes. For multi-session visibility or monitoring from outside VSCode, Grass's SSE stream (GET /events?sessionId=<id>) streams every agent action in real time and supports replay via Last-Event-ID header.

Why does Claude Code make bad decisions on large codebases?

Large codebases create a context-quality problem: when asked to understand a subsystem, the agent opens files from adjacent areas and forms a diluted, sometimes contradictory picture. Coograph addresses this by pre-indexing your repo into a dependency graph. Before the agent opens any files, it queries the graph to identify the 3–5 files actually relevant to the task — improving decision quality and reducing the token spend on files that don't matter.

How do I handle Claude Code approval prompts when I'm not at my desk?

Without a forwarding layer, Claude Code blocks on permission requests until someone types y in the terminal. Grass routes every pending approval request to your phone as a native modal before the tool call executes — you can approve or deny bash commands, file writes, and edits from anywhere. Sessions don't silently skip approvals; they wait for your response.

Can I use Argus, Coograph, and Grass at the same time?

Yes, and they don't conflict — they address different parts of the problem. Argus handles VSCode-native action logging per session. Coograph handles pre-read context control per repo. Grass handles multi-session approval forwarding and mobile monitoring when you're away from your desk. All three run independently.

Next Steps

Install Argus — github.com/yessGlory17/argus — search the VSCode Marketplace; active in under 2 minutes
Install Coograph — github.com/paullukic/coograph — run npm install -g coograph && coograph index in your largest active project, then add the CLAUDE.md protocol
Add Grass for mobile approval forwarding — npm install -g @grass-ai/ide && grass start — free tier includes 10 hours, no credit card required; scan the QR code and your phone becomes your approval gate

The opacity failure that left a team with a SaaS product nobody could explain is a tooling gap, not an AI limitation. The tooling now exists to close it. All three tools are available today.

This post is published by Grass — a machine built for AI coding agents that gives your Claude Code and Codex sessions an always-on cloud VM, accessible and controllable from your phone. Works with Claude Code and OpenCode.