Capability separation for coding agents: combine network sandboxing + egress DLP to contain compromised prompts

A coding agent is useful because it has capabilities: it can read a repo, run tests, call tools, and access the network. A prompt injection is dangerous for the same reason. Capability separation reduces blast radius by ensuring no single component has both unrestricted secrets and unrestricted egress.

The short version

  • Do not rely on prompts to protect secrets from tools.
  • Run agents in a sandbox with constrained filesystem and network access.
  • Route allowed egress through a proxy that performs DLP and destination policy.
  • Keep credentials scoped, short-lived, and separated from the network path when possible.
  • Assume any content the agent reads can be hostile.

The core pattern

Separate three capabilities:

  1. Reading workspace data and secrets.
  2. Making network requests.
  3. Performing side effects such as writes, deploys, and messages.

A safe architecture tries not to place all three in the same unchecked process. For example:

  • The agent can read the repo but has no direct internet access.
  • The egress proxy can reach the internet but does not hold developer secrets.
  • A credential gateway can use real API keys but only for scoped, authorized calls.
  • Write tools require policy checks or human approval.

If the model is tricked, its actions still cross boundaries that can deny, redact, or require approval.

Sandbox first

Start by limiting where the agent runs.

For local development, use a container, VM, or network namespace. Mount only the project directory, not the whole home directory. Avoid passing through ~/.ssh, cloud credentials, package registry tokens, and browser profiles unless the task absolutely requires them.

For CI, run the agent in an isolated job with minimal secrets. Prefer short-lived OIDC-issued cloud credentials over long-lived static keys. Restrict repository permissions: read-only for analysis jobs, write only for jobs that must open pull requests.

Filesystem rules should answer:

  • Which paths can the agent read?
  • Which paths can it write?
  • Can it access dotfiles and parent directories?
  • Can it execute downloaded binaries?

Network rules should answer:

  • Can the agent reach the public internet directly?
  • Which domains are allowed?
  • Are private CIDRs and metadata services blocked?
  • Are DNS queries controlled?

Do not forget metadata endpoints such as 169.254.169.254. Many cloud credential theft paths start there.

Add egress DLP

Sandboxing controls where traffic can go. Egress DLP controls what leaves.

Route allowed HTTP(S), WebSocket, and MCP traffic through a proxy that can inspect:

  • URLs and hostnames.
  • Headers.
  • Request bodies.
  • MCP tool arguments.
  • Tool responses before they return to the model.
  • DNS or hostname patterns where possible.

The proxy should block obvious credential classes: private keys, cloud access keys, provider API keys, OAuth tokens, SSH material, and high-entropy strings. It should also block exfiltration destinations that are outside the task’s allowlist.

For MCP tools, scan both directions. Outbound tools/call arguments can leak secrets; inbound tool responses can inject the next malicious instruction.

Example local shape

┌──────────────┐        only proxy egress        ┌──────────────┐
│ agent sandbox│ ──────────────────────────────▶ │ egress proxy │
│ repo mounted │                                 │ DLP + policy │
│ no raw net   │ ◀────────────────────────────── │ no secrets   │
└──────┬───────┘                                 └──────┬───────┘
       │ mediated MCP                                    │ approved net
       ▼                                                 ▼
┌──────────────┐                                 ┌──────────────┐
│ MCP gateway  │                                 │ upstream APIs │
│ call policy  │                                 │ and websites  │
└──────────────┘                                 └──────────────┘

The implementation can be Docker plus firewall rules, Kubernetes NetworkPolicy, a local proxy, or a purpose-built agent sandbox. The exact tool matters less than the invariant: direct egress is closed.

Side effects need a separate gate

DLP does not know whether a deployment is wise. A prompt-injected agent can cause harm without leaking secrets if it has permission to delete resources or push code.

Put write actions behind policy:

  • Allow read-only tools by default only within allowed paths and services.
  • Require approval for writes to protected branches.
  • Require approval for deploys, deletes, payments, emails, and signing.
  • Bind approval to exact arguments and expire it quickly.
  • Log every decision.

This is where MCP-aware gateways help because they can evaluate tool name and structured arguments before execution.

What this does not solve

Capability separation is containment, not perfect prevention. If the agent is allowed to send a summary to a trusted ticket system, it may still leak sensitive content into that system unless DLP catches it. If a user approves a dangerous write, the system may perform it. If a malicious package runs outside the sandbox, all bets are off.

It also adds friction. Developers will bypass controls that break normal workflows. Provide paved paths: preconfigured sandbox images, standard proxy settings, documented allowlist requests, and clear denial messages.

Rollout checklist

  1. Inventory agent secrets and network destinations.
  2. Remove unnecessary home-directory mounts.
  3. Block direct egress from the agent environment.
  4. Allow egress only through an inspecting proxy.
  5. Block private network ranges and metadata endpoints unless required.
  6. Wrap MCP servers with a gateway or policy proxy.
  7. Scope credentials per agent and task.
  8. Require approval for side effects.
  9. Export audit logs and traces.
  10. Test with canary secrets and prompt-injection fixtures.

That idea is also why we have been building Grass: agent execution should happen somewhere intentional, not necessarily inside your daily shell. You can run coding agents on a managed GrassVM or connect your own laptop/server, then control the work from your phone. Execution, files, git operations, and code storage stay on the selected machine; the mobile app is the controller.

That makes it a useful front end for this architecture: isolate the VM or self-hosted machine, route egress through DLP, keep approvals on for risky actions, and review diffs from mobile. Try it at https://codeongrass.com.

Conclusion

Coding agents should not run as all-powerful developer shells with unrestricted internet access. Combine sandboxing, egress DLP, MCP call policy, and scoped credentials so a compromised prompt hits multiple independent boundaries. The practical goal is simple: the agent may be fooled, but it should not be able to freely read secrets, choose a destination, and send them there.

Sources

  • PipeLab AI agent security category guidance
  • MCP official tools specification
  • Microsoft guidance on indirect prompt injection in MCP
  • Cloud sandboxing and NetworkPolicy best-practice discussions