Select a layer to see which trust control surfaces apply, which workflow patterns expose gaps there, what failures have been observed in production, and which playbook covers each gap. All paths resolve to the same 12 audit playbooks.

L1 User Interface & Channels
(3) Playbooks (1) Incident
Control surface
Input integrity
Users embed instructions in natural language that agents treat as system commands. Prompt injection through the UI layer is undetected because there is no enforcement boundary between user input and agent instruction space.
Agents process web content returned from user-triggered browsing as trusted context. Adversarial pages contain instructions the agent executes without distinguishing source.
Control surface
Session identity
Agents operate across session boundaries without re-establishing who is directing current actions. Delegated authority from one session carries into another without verification.
L2 Security & API Gateway
(4) Playbooks (1) Incident
Control surface
Authentication at the boundary
Gateway authenticates the agent as a workload identity but does not verify delegated human intent. The authentication that passes is "this is agent-001." The question not asked is "who authorized agent-001 to do this, right now."
Cold storage procedures assume human-in-the-loop at every approval step. Nation-state adversaries compromise the human touchpoints, not the cryptographic boundary. The signing ceremony proceeds with the human compromised.
Control surface
Rate limiting and scope enforcement
API gateway enforces rate limits on requests but not on the scope of actions per request. An agent making N authorized calls can execute far more than N permitted actions if each call carries broad tool permissions.
Control surface
Credential transit
Credentials passed through agent context windows transit layers that were not designed to handle secrets. Log aggregators, trace systems, and session summaries capture credential echo across the stack.
L3 Governance & Control Plane
(5) Playbooks (1) Incident
Control surface
Policy enforcement vs. policy guidance
Wire transfer limits written into system prompts are policy guidance, not enforced gates. A treasury agent told "$2M limit is hardcoded in policy" interpreted an adversarial prompt as authorization. The $5M wire executed. The limit was a sentence, not a control.
Control surface
Agent inventory and lifecycle
Organizations track the agents they deployed. They do not track the agents those agents spawned. The governance plane has no visibility into the transitive delegation graph. Shutdown signals reach the parent. Sub-agents continue executing.
Control surface
Audit trail completeness
Most organizations log what the agent did. Regulators ask why it decided what it decided. The governance plane captures outputs. It does not capture the reasoning chain that produced them. Those are different records and most audit trails only contain one.
L4 Agent Runtime & Orchestration
(5) Playbooks (2) Incidents
Control surface
Authorization at execution time
Authorization at tool registration is not confirmation at execution time. A bank's treasury agent had strong authorization controls at registration. The agent was asked to show the user confirmation flow before a wire. Its response was past tense: "The transfer has been submitted." The wire was already processing.
ServiceNow's virtual agent passed design review. Security review asked "is the vendor secure?" not "is every agent action identity-verified at execution?" The gap was invisible to review. Visible in production.
Agents executing Plan-and-Execute or Orchestrator-Worker patterns reach irreversible actions (wire transfers, trade submissions, signed transactions) without a pre-execution authorization boundary that the agent itself cannot reason around.
Control surface
Halt and interrupt controls
Organizations have a stop button. They have never tested what stops when they press it. At machine speed, the gap between clicked PAUSE and all agent actions stopped is measured in seconds. In that window, agents operating in parallel across Orchestrator-Worker patterns execute dozens of additional actions.
Control surface
Delegation chain integrity
When agents spawn agents, delegated authority accumulates without enforcement of delegation depth or permission scope reduction. The transitive permission graph is invisible to the orchestrator. Sub-agents operate with permissions inherited from the parent that the parent was never explicitly authorized to delegate.
L5 Model Layer
(5) Playbooks (1) Incident
Control surface
Instruction vs. data distinction
LLMs cannot maintain a machine-enforced boundary between trusted instructions and untrusted data when both are expressed in natural language. Spotlighting (XML tags, prompt markers) reduces attack surface. It does not eliminate it. The model is a probabilistic interpreter, not a deterministic boundary enforcer.
Amazon Q's code scanner found nothing malicious in the dependency. The attack was a natural-language instruction embedded in a config file. The AI agent interpreted it as a command. Code scanning has no category for "text that instructs."
Control surface
Memory integrity
Malicious instructions stored in semantic memory replay into every future context window. The attack does not repeat. It persists. One successful injection turns into durable compromise across every subsequent session.
Control surface
Probabilistic decision auditability
AI agents sign transactions based on probabilistic inference, not deterministic rules. The authentication gap: the agent can prove it executed the transaction. It cannot prove the decision to execute was correct, authorized, or un-manipulated.
L6 Data, Knowledge & Context
(4) Playbooks (1) Incident
Control surface
Retrieval source integrity
Most organizations secure direct user input. They don't secure the retrieval sources the agent incorporates as context. The agent cannot distinguish a document injected with adversarial instructions from a trusted knowledge base entry. Both enter the context window as data. The model treats both as instructions if the framing is right.
Amazon Q's attack vector was a config file in the project repository. Not a network intrusion. Not a malformed API call. A text file with instructions the agent read, trusted, and executed.
Control surface
Local secrets exposure
"Local-first" solved cloud exfiltration. It did not solve local secrets exposure. The coding agent read the SSH private key, AWS credentials, and database passwords because it was asked to debug SSH. It interpreted that as permission. There was no enforcement layer between file system access and the agent's context window.
Control surface
Persistent memory attack surface
Persistent memory patterns (semantic memory, long-term memory stores) create a new attack surface: instructions written into memory by an adversary replay into every future session. The agent that was compromised yesterday is still compromised today. The attack surface is the memory layer, not the inference step.
Control surface
Data egress via semantic transformation
DLP blocks structured patterns. Agents exfiltrate through semantic transformation. The support agent sent customer account balances, employer names, and spending patterns through six outbound channels. DLP reported: no sensitive data detected. The data was reformatted as natural language. The patterns were unrecognized.
L7 MCP Gateway & Tool Integration
(5) Playbooks (2) Incidents
Control surface
MCP server authentication
60% of LLMjacking attacks shifted to MCP reconnaissance. Criminals monetize privileged interpreter access, not just compute cycles. The MCP server is the boundary where tool permissions are brokered. It is also the boundary that most security teams have not modeled.
Control surface
Tool permission scope
Most permission reviews check what the agent is allowed to do. They miss whether those permissions are ever actually needed. Least privilege at tool registration does not constrain which permissions the agent requests at runtime. MCP servers grant what agents request unless enforcement is explicit.
Control surface
Credential handling in tool calls
Credentials passed as tool call parameters flow through the MCP layer into logs, traces, and session artifacts. Secret scanning protects code repositories. It does not protect the conversational output layer where credentials surface as quoted text in agent responses.
Control surface
Web content isolation
When agents browse the web through MCP-connected tools, the fetched content enters the context window without enforcement separation from trusted instructions. Adversarial pages publish instructions the agent executes. The attack surface is web content becoming executable commands, not web browsing capability itself.
L8 Observability, Audit & Compliance
(5) Playbooks
Control surface
Observability vs. control
Tracking what happened is categorically different from determining what is allowed to happen. Most observability stacks are built for the first. Regulators require evidence of the second. Pre-execution authorization boundaries cannot be reconstructed from post-execution logs.
Control surface
Decision reasoning capture
AI agents make decisions through probabilistic reasoning chains. Logs capture action outputs. They do not capture the reasoning state that produced the action. When regulators or auditors ask why an agent made a decision, most organizations can show the output. They cannot reconstruct the decision path.
Control surface
Egress monitoring coverage
Compliance monitoring looks for structured data patterns: PII formats, account number patterns, known credential signatures. Agents exfiltrate data reformatted as conversational text. The patterns do not match. The egress is not detected. The compliance layer reports clean. The data is already outside the perimeter.

Each workflow pattern exposes distinct trust control surfaces. The same agent can implement multiple patterns simultaneously. Expanding a group reveals individual pattern cards, each showing control surfaces, reading links, and audit buttons.

Linear / sequential
(5) Playbooks (3) Incidents
Prompt chaining
Sequential prompts where output of one becomes input of next. Context accumulates across steps without sanitization boundaries.
Context integrity Input integrity
Chain-of-thought
Reasoning traces exposed in context window. Intermediate steps create additional injection surface. Traces logged as audit evidence but don't represent authorization decisions.
Decision auditability Context integrity
ReAct
Interleaved reasoning and tool execution. Each tool result re-enters the context window and influences subsequent reasoning. Tool outputs become context inputs with no trust reclassification.
Tool execution Context integrity Authorization
Branching / search
(8) Playbooks (1) Incident
Tree of Thoughts
Multiple reasoning branches explored and pruned. Branch exploration can invoke tools across paths. Permissions granted for exploration branches may persist into execution branches.
Tool permission scope Audit trail
Graph of Thoughts
Non-linear reasoning graph with merging and branching nodes. Cross-branch data flows create complex context accumulation. Provenance of any context element is difficult to trace.
Context integrity Decision auditability
ReWOO
Reasoning and acting separated: plan first, then execute. Execution step has no live reasoning correction. Irreversible actions execute from plans generated in manipulated context.
Irreversibility gates Context integrity
MCTS
Monte Carlo Tree Search for decision optimization. Simulation paths may invoke real tools. The boundary between simulation and execution is not machine-enforced.
Tool execution Irreversibility gates
Multi-agent coordination
(16) Playbooks (3) Incidents
Orchestrator-Worker
Central orchestrator delegates to specialized worker agents. Worker agents inherit permissions from orchestrator. Delegation depth and scope reduction are not enforced.
Delegation integrity Halt controls Authorization
Plan and Execute
Planner generates action plan, executor implements it. Authorization checks at plan generation do not recur at execution. Financial operations execute from plans generated in context the planner cannot verify.
Irreversibility gates Authorization at execution
Evaluator-Optimizer
One agent evaluates outputs and another optimizes based on feedback. Optimization loops can invoke tools repeatedly. Each iteration may accumulate permissions or context without reset.
Tool permission scope Audit trail
Reflexion
Agent reflects on its own outputs and revises. Poisoned output from one step becomes trusted input for reflection. Memory of compromised reflection persists.
Memory integrity Context integrity
Parallelization
Multiple agent instances execute simultaneously. Halt signals must reach all parallel instances. At machine speed, the window between halt signal and all instances stopped allows additional irreversible actions.
Halt controls Delegation integrity
Router
Routing agent classifies intent and directs to specialized sub-agents. Adversarial framing of intent can route to higher-privilege agents without authorization escalation check.
Authorization Input integrity
Multi-agent debate
Multiple agents argue positions and synthesize consensus. Adversarial agent position influences synthesis without source verification.
Context integrity Authentication
Role-based
Agents assigned functional roles with role-scoped permissions. Role assignment is static at initialization. MCP server grants permissions based on role, not verified current task.
Authentication Tool permission scope
Memory / state
(6) Playbooks (1) Incident
RAG-augmented
Retrieval-augmented generation pulls external documents into context. Retrieved content is treated as trusted context. Adversarially crafted documents enter the context window as data the model cannot distinguish from instructions.
Retrieval source integrity Context integrity
Persistent memory
Agent stores and recalls information across sessions. Adversarial instructions persisted to memory replay into every future context window. The attack does not repeat. It persists.
Memory integrity Local secrets
Stateful graph
Agent operates on graph state that persists and is mutated across steps. Adversarial state injection early in the graph propagates through all downstream steps.
Context integrity Decision auditability Delegation integrity

Each failure pattern has a name. The name is the pattern that recurs across institutions, not the incident. Click any pattern to see the full description, the reading, and the audits that cover it.

Prompt injection through the input layer
(2) Playbooks (2) Incidents
Natural language instructions embedded by users or adversaries are treated as agent system commands. No enforcement boundary exists between user input space and agent instruction space. The attack requires no technical exploit: text formatted as instruction executes as instruction.
Policy-as-text: limits that live in the system prompt
(2) Playbooks
Wire transfer limits, approval thresholds, and authorization rules encoded in system prompts are policy guidance, not enforced gates. The model can be reasoned out of following them. A treasury agent told the $2M limit was "hardcoded in policy" interpreted an adversarial prompt as authorization. The $5M wire executed. The limit was a sentence, not a control.
Registration-time authorization without execution-time confirmation
(2) Playbooks (1) Incident
Authorization confirmed at tool registration is not confirmation at execution time. A bank's treasury agent had strong authorization controls at registration. The agent was asked to show the user confirmation flow before a wire. Its response was past tense: "The transfer has been submitted." The wire was already processing. Pre-execution authorization boundary did not exist.
Procedural trust assumption: humans as the enforced boundary
(1) Playbook (1) Incident
Cold storage procedures assume human-in-the-loop at every approval step. Nation-state adversaries compromise the human touchpoints, not the cryptographic boundary. The signing ceremony proceeds with the human compromised. $1.46B loss. The threat model was the hardware. The actual attack surface was the people operating it.
MCP gateway as unmodeled attack surface
(2) Playbooks (1) Incident
60% of LLMjacking attack activity shifted to MCP reconnaissance. Adversaries monetize privileged interpreter access, not compute cycles. The MCP server is the boundary where tool permissions are brokered. Most security teams have not modeled it as an attack surface. The gateway that connects agents to tools connects adversaries to agent permissions.
Durable memory compromise: one injection, persistent access
(1) Playbook
Malicious instructions stored in agent semantic memory replay into every subsequent session. The attack does not repeat. It persists. One successful injection creates durable compromise across all future sessions without re-exploitation. The agent that was compromised yesterday is still compromised today. The attack surface is the memory layer, not the inference step.
Task-as-permission: agent interprets scope from context
(1) Playbook
"Local-first" deployment solves cloud exfiltration. It does not solve local secrets exposure. The coding agent read the SSH private key, AWS credentials, and database passwords because the developer asked it to debug SSH. The agent interpreted the task as permission to read relevant files. No enforcement layer existed between file system access and the agent's context window.
Semantic exfiltration: DLP-invisible data egress
(1) Playbook
DLP blocks structured patterns. Agents exfiltrate through semantic transformation. The support agent sent customer account balances, employer names, and spending patterns across six outbound channels by reformatting structured data as natural language. DLP pattern matching found no match. The compliance layer reported clean. The data was already outside the perimeter.
Credential echo: secrets surfacing in conversational outputs
(1) Playbook
Developer pastes an AWS key into the agent. The agent quotes it in the response. A session summary later includes the access key, the secret key, and a production database password. Secret scanning protects repositories. It does not protect agent conversational outputs. The credential appears in text the agent generates, not in code it writes.
Incomplete halt: stop button that doesn't reach sub-agents
(2) Playbooks
Halt signal sent to the parent agent. Sub-agents spawned through the delegation graph continue executing. At machine speed, parallel agent instances complete additional irreversible actions in the window between halt signal and full stop. Organizations have a stop button. They have not tested what stops when they press it. At the speeds agents operate, that gap is not negligible.
Invisible delegation graph: permissions below the inventory line
(1) Playbook
Organizations track the agents they deployed. They do not track the agents those agents spawned. The transitive permission graph is invisible to the governance plane. Sub-agents inherit permissions the parent was never authorized to delegate. The agent inventory is the tip. The delegation graph is underwater. Shutdown, audit, and governance stop at the inventory line.
What vs. why: audit logs that cannot answer regulators
(2) Playbooks
AI agent signs a financial transaction. The audit trail confirms execution. It cannot reconstruct the reasoning chain that authorized the decision. Regulators ask why. The log answers what. Most organizations log what the agent did. They do not capture the reasoning state that produced the action. These are different records and most audit trails only contain one of them.
Web content as executable instruction via context window
(2) Playbooks
Adversarial instructions embedded in a web page fetched by the agent through an MCP-connected browser tool enter the context window without trust reclassification. The page publishes instructions the agent executes. Web browsing capability was not the risk. Web content becoming executable commands through the context window was. The attack surface is the content, not the channel.