VellaVeto fail-closed policy mode: deny unsafe MCP tool calls by default and require approval for writes

Most agent safety controls are advisory until they sit on the tool-call path. VellaVeto is positioned as a runtime proxy for MCP tools: every call is evaluated against policy before execution, and policy failures deny the call. That fail-closed posture matters for agents that can read files, call SaaS APIs, or mutate production systems.

The short version

  • Put policy enforcement between the MCP client and server, not only in the system prompt.
  • Fail closed: no matching rule, missing context, parser error, or policy-engine failure should deny.
  • Treat writes, deletes, deploys, payments, and signatures as approval-required actions.
  • Be explicit about what the proxy cannot solve: model jailbreaks and malicious package installs outside the mediated path.

Why fail-closed matters

A fail-open guardrail is convenient during development. If the scanner times out, the tool call continues. If a field is missing, the agent gets the benefit of the doubt. That is dangerous for autonomous workflows because an attacker only needs to trigger the weird path.

Fail-closed policy reverses the default. The proxy must positively prove that a call is allowed. Otherwise the answer is no.

For MCP, this maps well to the protocol. Tool invocation flows through tools/call with a tool name and structured arguments. A proxy can evaluate that request before forwarding it to the upstream server.

Policy shape

A useful baseline policy separates safe reads from side effects:

default: deny

rules:
  - id: allow-list-workspace
    match:
      server: filesystem
      tool: list_files
      args:
        path_prefix: /workspace
    effect: allow

  - id: deny-secret-paths
    match:
      server: filesystem
      tool: read_file
      args:
        path_regex: "(^|/)\\.(ssh|aws|config)|\\.env$"
    effect: deny

  - id: approve-writes
    match:
      side_effect: write
    effect: require_approval

Exact VellaVeto syntax may differ by version; the important design is default-deny with specific allow rules and approval gates for writes.

What counts as a write?

Do not limit “write” to filesystem writes. For agents, side effects include:

  • Creating or deleting files.
  • Sending email or chat messages.
  • Opening pull requests.
  • Pushing commits or tags.
  • Running shell commands.
  • Deploying infrastructure.
  • Updating tickets, calendars, CRM records, or databases.
  • Signing commits, artifacts, or transactions.
  • Calling payment, billing, or identity-management APIs.

If the action is hard to undo, require approval. If approval is needed, bind it to the specific call: tool name, arguments, session, user, and time. A generic “approve this server forever” button recreates the original risk.

Evaluate every call, not only discovery

Static MCP scanning is useful but insufficient. A server can expose innocent metadata during setup and later change behavior, schema, or outputs. Public VellaVeto descriptions emphasize runtime evaluation; that is the right boundary.

At invocation time, evaluate:

  • Tool identity and source server.
  • Arguments after JSON parsing and normalization.
  • Session identity.
  • User or repo context.
  • Requested path, host, branch, model, tenant, or resource ID.
  • Whether the action is read, write, or irreversible.
  • Whether a fresh approval exists.

Log the verdict before forwarding the call. For denied calls, return a clear error to the client without leaking sensitive policy internals.

Rollout plan

Start in observe mode only if the environment is non-sensitive. Capture tool names, arguments metadata, and intended verdicts. Use this to build an allowlist from real workflows.

Then move high-risk categories to enforce mode:

  1. Secret paths: .env, .ssh, .aws, cloud config, package registry tokens.
  2. Network exfiltration: arbitrary webhooks, unknown domains, private metadata IPs.
  3. Shell execution: curl | sh, persistence, credential dumping, destructive filesystem commands.
  4. Writes: commits, deployments, database mutations, messages.

Finally, change the global default to deny. Keep emergency bypass procedures separate from the agent runtime and heavily audited.

Gotchas

Fail-closed systems can break work. If your policy cannot express common legitimate actions, developers will bypass it. Invest in good error messages and quick policy updates.

A proxy only controls traffic that passes through it. If the agent can launch an unmediated subprocess with network access, policy is incomplete. Combine the proxy with network sandboxing, filesystem permissions, and scoped credentials.

Approvals can become rubber stamps. Show the exact arguments and risk category, not only the tool name. “Allow github.create_issue” is less useful than “Create public issue in org/repo with title and body preview.”

This is the kind of approval loop we have been building Grass around: making it practical from a phone. Agents run on a GrassVM or your own machine; the iPhone/iPad app shows permission requests for actions like Bash, Write, Edit, Read, Glob, and Grep, with context and diffs before you approve.

If your policy requires approval for writes, Grass lets you get notified, review the change, and keep the session moving without sitting at your desk. Visit https://codeongrass.com.

Conclusion

VellaVeto’s fail-closed model is the right default for MCP tools with side effects. Let reads through only when they match narrow policy. Require bound approval for writes. Deny ambiguity. The goal is not to make prompt injection impossible; it is to ensure a compromised prompt cannot turn uncertainty into tool execution.

Sources

  • VellaVeto public launch discussions and crate listing
  • MCP official tools specification
  • TrueFoundry and Descope guidance on MCP gateway enforcement