Advanced
securityzero-trustspiffeidentitycapability-tokens

Zero Trust & Identity-First Agent Security

An AI agent with a long-lived API key and no oversight is a serious security risk. This pattern removes standing credentials entirely and requires a fresh, single-use token for every action an agent takes.

Used in
Zero Trust Agent Security
Mini Map
Pan, zoom, and explore. Click export to download as PNG.

Interactive diagram โ€” pan, zoom, and explore. Click export to download as PNG.

๐Ÿ“Why long-lived credentials and autonomous agents are a dangerous combination

The "infinite session" problem โ€” and how Zero Trust fixes it

The security model most teams start with looks like this: create a service account, generate an API key, paste it into the agent's environment as an environment variable. The agent uses this key for everything โ€” reading data, writing data, calling external APIs โ€” and the key stays valid indefinitely unless someone remembers to rotate it.

This creates what security practitioners call the "infinite session" problem. A compromised agent, or a misbehaving one, has full access to everything that key grants โ€” for as long as the key exists. There is often no record of which specific actions the agent took. There is no limit on what it can do next. If the key leaks โ€” and long-lived keys have a way of showing up in logs, error messages, and version control โ€” an attacker gains persistent access.

Zero Trust agent security is built on a different principle: no agent should have standing permission to do anything. Instead of a stored credential, each agent has a cryptographic identity โ€” a verifiable proof of who it is, derived from its runtime environment rather than a secret it carries. When it needs to take an action, it exchanges that identity for a token that is valid only for that one specific action and expires in seconds. A policy engine reviews every request. High-risk actions require a human to confirm before they proceed.

The result: a compromised agent's blast radius is limited to the single token it currently holds โ€” one action, expiring in moments.

Next up:Workload Identity with SPIFFE / SPIRE
๐Ÿ”งCryptographic proof of who an agent is, with no stored secrets

Workload Identity with SPIFFE / SPIRE

The first step is giving each agent a verifiable identity โ€” not a username and password that any process with the right environment variable can use, but a cryptographic credential tied to the specific workload running at a specific time in a specific place. SPIFFE (Secure Production Identity Framework for Everyone) is an open standard for this. Each agent gets a SPIFFE ID โ€” a URI like `spiffe://company.org/agents/data-analyst` โ€” backed by a short-lived X.509 certificate or JWT token. Identity is verified from runtime signals: which Kubernetes pod the agent is running in, what hardware measurements the host machine produces. There are no passwords, no API keys, no secrets to manage. The certificate rotates automatically every few minutes.

Workload Identity with SPIFFE / SPIRE
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿ›๏ธ

SPIRE Server

The certificate authority at the center of the identity system. Receives attestation evidence from agents and issues SPIFFE identity documents after verifying that the agent is genuinely the workload it claims to be.

๐Ÿชช

SPIRE Agent

A daemon running on each host machine that collects attestation evidence (Kubernetes pod spec, TPM measurements, process metadata) and uses it to fetch identity documents on behalf of workloads on that host.

๐Ÿ”

X.509 / JWT SVID

The identity document issued to an agent โ€” a short-lived certificate or JWT containing the SPIFFE ID. Rotated automatically every few minutes. A leaked certificate is worthless within moments.

๐Ÿšซ

Zero-Secret Container

The agent container contains no API keys, no passwords, and no environment variable secrets. All credentials are derived at runtime from the attested identity. There is nothing useful to steal from the container image or its configuration.

Next up:Capability Tokens
๐Ÿ”งSingle-use permissions that expire in seconds

Capability Tokens

Having an identity is not the same as having permission to act. An agent knowing that it is "data-analyst-42" does not mean it should be allowed to delete production data. Capability tokens separate the question of identity (who are you?) from authorization (what are you allowed to do right now?). When the agent needs to call a tool, it presents its SPIFFE identity to a token exchange service. The service checks whether this identity is allowed to perform this specific action on this specific resource at this moment. If yes, it issues a capability token โ€” a signed JWT containing exactly those permissions, valid for seconds, bound to a single use. The tool accepts the token, verifies it, performs the action, and the token is consumed. Stolen tokens are useless almost immediately.

Capability Tokens
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐ŸŽŸ๏ธ

Scoped Claims

The token's payload specifies exactly one action on exactly one resource โ€” "read this specific file", "call this API endpoint with this HTTP method". Nothing broader than what was explicitly requested.

โฑ๏ธ

Short TTL

Tokens expire after seconds to minutes. Even if intercepted in transit, the window for misuse is extremely short โ€” typically shorter than the time it would take to use the token for anything harmful.

๐Ÿ”

Token Exchange Broker

The service that receives a SPIFFE identity assertion and issues a scoped capability token in return, implementing the RFC 8693 token exchange standard. It enforces which identities are allowed to request which capabilities.

๐Ÿ’ฅ

Single-Use

After the token is used for the action it was issued for, it cannot be reused. The tool server marks it as consumed. Replay attacks โ€” intercepting a valid token and trying to use it again โ€” fail immediately.

Next up:Policy Engine & Approval Proxy
๐Ÿ”งEvaluating every request and gating the high-risk ones

Policy Engine & Approval Proxy

Capability tokens control what an agent is technically permitted to do. The policy engine adds a business-reasoning layer: should this agent be allowed to do this, given the current context? The difference matters. An agent might hold a valid token to write to a database, but a policy might say that write operations affecting more than 1,000 rows always need a human review. Or that certain operations are only allowed during business hours. These are rules that change over time and belong in a policy language, not hardcoded in agent logic. A policy engine evaluates these rules on every request, and an approval proxy inserts a human checkpoint for actions that exceed a risk threshold.

Policy Engine & Approval Proxy
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
โš–๏ธ

OPA / Cedar

Declarative policy languages for writing authorization rules in a structured, reviewable format โ€” "agents of type X can write to database Y unless the operation affects more than 1,000 rows." Rules live outside application code and can be updated without a deployment.

๐Ÿ›ก๏ธ

Action Proxy

Every request from an agent to a tool passes through this proxy. It verifies the capability token, runs the policy evaluation, logs the action, and either forwards the request or blocks it with a reason.

๐Ÿ‘ค

Human Approver

For actions the policy marks as high-risk โ€” deleting data, making financial transactions, calling external APIs โ€” the proxy holds the request and sends an approval notification to a human. The action only proceeds when a human approves it.

๐Ÿ“Š

Tamper-Evident Audit Log

Every action โ€” whether allowed, blocked, or approved โ€” is written to an append-only log with a cryptographic hash chain linking entries together. Impossible to silently delete or modify past entries without detection.

Next up:When to Use This Pattern
๐ŸŽฏSigns this is the right architecture for your situation

When to Use This Pattern

Agents operate across organizational or tenant boundaries where a credential leak would have cross-tenant consequences
Regulatory requirements demand a per-action audit trail showing exactly which agent did what, when, and with what authorization
You are replacing long-lived service account keys that cannot be safely scoped, rotated frequently, or limited in scope
Multi-agent systems where parent agents spawn sub-agents and need to safely delegate a subset of their own authority
Next up:Trade-offs
โš–๏ธWhat you gain โ€” and what it costs

Trade-offs

Benefit
Cost
No stored secrets means there are no long-lived credentials to leak, steal, or forget to rotate
Running SPIFFE/SPIRE adds operational complexity โ€” certificate rotation, attestation setup, and high availability for the SPIRE server
Each action is scoped to exactly what was requested โ€” a compromised agent can do very little damage
Token exchange adds a network round-trip to every tool call, increasing latency on hot paths
Cryptographic identity creates a trustworthy, verifiable audit trail that holds up to scrutiny
Tools that accept only static API keys need a compatibility shim before they can participate in this model
The architecture works consistently across cloud providers and on-premises environments
Writing OPA or Cedar policies is a new skill โ€” expect a learning curve and an ongoing policy review process