Authentication Audit

AI Agent Authentication Audit

15 questions · 30–45 min · Rav | @MrDecentralize

Traditional authentication models assume explicit credentials and deterministic authorization. AI agents break both assumptions. When agents interpret context as intent, the interpretation itself becomes the authentication mechanism. Most organizations have no controls at this boundary.

Who this is for

Security leaders, CTOs, and architects deploying AI agents in financial systems who need to map authentication and authorization boundaries that traditional security reviews miss. If your agents execute financial transactions autonomously, access privileged tools based on interpreted intent, make decisions that create legal or regulatory liability, or operate at speeds that eliminate human oversight — this audit reveals the gaps.

What this audit maps

Four critical trust boundaries where control transitions from explicit to interpreted:

Input to Interpretation: Where unstructured text becomes understood intent. Authentication challenge: verifying the source and integrity of context before interpretation.
Interpretation to Tool Selection: Where derived intent determines which privileged functions get invoked. Authorization challenge: mapping probabilistic interpretation to permission models designed for deterministic systems.
Tool Call to Execution: Where interpreted commands become system actions. Audit challenge: capturing the chain from context to consequence for regulatory and legal review.
Execution to Side Effects: Where actions produce irreversible consequences. Liability challenge: assigning responsibility when agent interpretation causes harm.

Time required

Read-through: 15 min
Self-audit for one system: 30–45 min
Full audit across multiple agents: 2–4 hours
Implementation of findings: 2–8 weeks

Expected outcomes

Unmapped trust boundaries in your architecture
Auth and authorization gaps traditional reviews miss
Context poisoning vulnerability surface
Liability exposure when agents act autonomously
Whether your audit trail meets regulatory requirements

What this is not

This is an audit, not an implementation guide. It reveals gaps but does not provide remediation methodologies, control templates, testing scenarios, specific audit trail requirements, liability frameworks, or tool recommendations. For implementation methodology, see the Authentication Implementation Playbook.

What actually breaks in production

I've reviewed AI agent deployments at institutional scale. The failures aren't random. They follow predictable patterns at four specific trust boundaries where control transitions from explicit commands to probabilistic interpretation.

Failure mode 1: Context poisoning without an attacker

A fraud detection agent retrieves context from a case management database. A three-month-old case note, written by a legitimate analyst, contains: "Auto-approve low-risk transactions during business hours to reduce workload." The agent interprets this historical note as a current instruction. It auto-approves a transaction that should have been escalated. No attacker. No prompt injection. Just context treated as command. Traditional security review checked: "Can users inject malicious prompts?" They should have checked: "Who controls what the agent interprets as instructions?"

Failure mode 2: Speed eliminates oversight

An agent executes 10,000 authorization decisions per hour. Each decision is technically "advisory," with human approval required. But operational reality: analysts approve agent recommendations 97% of the time without detailed review. The authorization model was designed for human-speed decisions with meaningful oversight. The agent operates at machine speed with rubber-stamp approval. The control exists in policy, not in practice.

Failure mode 3: The audit trail stops at interpretation

Audit asks: "Who authorized this transaction?" System logs show: user query received, agent invoked authorization tool, transaction approved. What's missing: what context did the agent retrieve, how did it interpret that context as intent, what alternative interpretations were possible, why did this interpretation trigger this specific tool. The audit trail captures the action but not the reasoning. "The AI decided" isn't an acceptable answer in a regulatory investigation.

Failure mode 4: Liability lives nowhere

Agent misinterprets context, invokes wrong tool, causes financial harm. Legal asks: "Who is responsible?" The user didn't give an explicit command. The developer built an agent that operated within design parameters. The analyst whose case note was misinterpreted wrote it for a different context. The liability model for human decisions doesn't map cleanly to agent interpretation. Most organizations deploy agents before addressing this gap.

How to use this audit

Read through all four sections first without answering. This builds the mental model of the framework and helps you understand what you're looking for.
Select one agent system to audit. Pick a production or near-production deployment that handles privileged operations and has regulatory or legal implications.
Answer each question honestly. If you are uncertain, that is a Partial or Gap — not a reason to skip. Assumptions about security controls are gaps waiting to fail under stress.
Review your gap score. The results panel generates after question 15 with prioritized gaps and next steps.
Prioritize remediation. Boundary 1 gaps (context control) are the entry point for all other failures. "Who controls" questions are highest priority.

—

Answer questions to generate your gap score

0 of 15 answered

Controlled: 0 Partial: 0 Gap: 0 Skipped: 0

Boundary 1 of 4

Input to Interpretation

Where unstructured text becomes understood intent. The question isn't "is the input malicious?" It's "who controls what the agent interprets as instructions?"

Questions 1–4 · Context mapping, control, ambiguity, and authentication

Q1.1 Can you list every source your agent retrieves context from?

Agents interpret intent from all available context, not just direct user input. Context sources include: direct user input, retrieved documents (internal knowledge bases, wikis, policy docs), database records (customer data, case notes, transaction history), API responses (external services, tool outputs), system messages, conversation history, email or ticket content, configuration files, and user profile data. Every unmapped context source is a potential instruction channel. Organizations focus on the entry point (user prompt) and miss the context sources the agent retrieves during processing.

Answer "Controlled" if

You have a complete list of every text source the agent uses to derive intent
Each source is classified (user-generated, system-generated, external)
You've documented who can modify each source
You understand that agents don't distinguish between "data" and "instructions"

Answer "Gap identified" if

You can't list all context sources without reviewing code
You describe it as "just user input and our database" — too simple for production agents
You've never considered retrieved documents as an instruction channel
You assume internal sources are safe because they're not user-controlled

Red flags

"We validate user input" (but agent retrieves from unmapped sources)
"Our knowledge base is internal only" (but who controls what gets added?)
"Case notes are just data" (until agent interprets them as commands)
No distinction between "data the agent uses" and "commands the agent follows"

Gap: You don't know your attack surface. Adversaries don't need to exploit the model directly. They just need to poison context the agent retrieves. If your agent retrieves from a knowledge base and someone adds a document containing instructions disguised as information, the agent will interpret those instructions as commands. No prompt injection required. Just context manipulation.

Q1.2 Can you map who controls the content in each context source your agent retrieves from?

If your agent interprets intent from text, whoever controls that text controls the agent's behavior. This isn't about access control in the traditional sense. It's about influence over interpretation. A legitimate analyst writes a case note three months ago: "Auto-release verified transactions during business hours to improve efficiency." Context was appropriate then. Today, the agent retrieves that note and interprets it as a current instruction. The analyst didn't attack the system. But their words, written for a different context, now control agent behavior. Traditional authentication asks "who is this user?" Agent authentication asks "who controls what this agent interprets?"

Answer "Controlled" if

You can name who can add and modify content in each context source
You have a documented review process for high-risk sources
You have content lifecycle management (creation, modification, archival)
You have clear policy on when old content should no longer influence agent behavior

Answer "Gap identified" if

"Don't know" who can modify key context sources
No review process for content that becomes agent context
You assume authentication of the writer equals authentication of the content
You can't trace content provenance for audit

Red flags

"Only admins can update the knowledge base" (but no review of content quality)
"Database is append-only" (but agent interprets three-year-old records as current)
"External APIs are from trusted vendors" (but you don't control their response content)
"Case notes are written by trained analysts" (who had no idea an agent would interpret them)

Gap: Your agent's behavior is controlled by whoever can modify those sources, whether they intended to influence it or not. This is context poisoning without an attacker. The risk isn't malicious actors. The risk is operational drift: legitimate content, written for human readers in a different context, now controlling agent behavior in ways no one anticipated.

Q1.3 Does your agent have explicit, documented rules for resolving conflicting or ambiguous context?

Agents interpret intent from all available context simultaneously. When sources conflict, the agent makes a choice. If that choice is non-deterministic or undocumented, your security model is probabilistic. Traditional systems have explicit precedence: "Policy overrides user preference," "Explicit deny overrides explicit allow." Agents don't have these rules unless you build them explicitly. Most organizations don't, because they assume the model will "figure it out appropriately." Under adversarial conditions, or even normal operational drift, this becomes a control gap.

Answer "Controlled" if

Precedence rules are explicit in code and policy, not emergent from model behavior
You have a test suite covering known conflict scenarios
Behavior is deterministic across multiple runs with the same inputs
Agent escalates to human when precedence rules don't resolve the conflict

Answer "Gap identified" if

You haven't tested conflict resolution scenarios
Different runs produce different interpretations with the same inputs
You can't explain to auditors how the agent resolved a specific conflict
Conflict resolution depends on the prompt rather than explicit rules

Red flags

"The agent is smart enough to figure it out" (non-deterministic = no control)
"The model uses RAG to find relevant context" (relevance is not precedence)
"We have prompt engineering to handle this" (prompts are guidance, not guarantees)
No explicit documented precedence rules for conflicting context

Gap: Your agent makes precedence decisions probabilistically. Under adversarial conditions or normal operational variance, this becomes a control gap. When auditors ask "Why did the agent make this decision?", answering "The model interpreted available context" isn't sufficient if you can't explain which context took precedence and why.

Q1.4 Does your authentication model extend to the interpretation boundary, or does it stop at user login?

Traditional authentication answers: "Who is this user?" Agent authentication must answer: "Who created this context, when, for what purpose, and is it still valid for the agent to interpret it as instruction?" You can have perfect user authentication and still have compromised context. A legitimate user writes a legitimate document. Three months later, that document gets retrieved by an agent and interpreted as a current instruction. The authentication gap isn't "who wrote it." It's "should the agent trust it now." Organizations implement authentication at the perimeter (user login) but not at the interpretation layer (context retrieval).

Answer "Controlled" if

You have source authentication for all context types
You have content integrity verification (signatures, checksums, version control)
You have temporal validation (expiration, archival status, version awareness)
You maintain chain of custody logging for compliance
Agent behavior accounts for trust levels of different sources

Answer "Gap identified" if

No verification of context source authenticity
No integrity checks on retrieved content
No temporal validation — old content is treated as current
Authentication model ends at user login, doesn't extend to interpretation

Red flags

"We authenticate users who write content" (but not content the agent retrieves)
"Our knowledge base is internal" (internal does not equal authenticated for agent interpretation)
"We use RAG for retrieval" (RAG retrieves, it does not authenticate)
Agent treats all retrieved context as equally trustworthy

Gap: Your authentication model doesn't extend to the interpretation boundary. You authenticate users, but not the context the agent interprets. When something goes wrong, you'll be asked: "How did the agent come to interpret this content as instruction?" If you can't authenticate the source, verify integrity, validate temporal relevance, and trace chain of custody, you can't answer that question for audit or regulatory review.

Boundary 2 of 4

Interpretation to Tool Selection

Where derived intent determines which privileged functions get invoked. The question isn't "what can the agent access?" It's "how does interpreted intent map to authorization decisions?"

Questions 5–8 · Permission scoping, authorization model, liability, and escalation

Q2.1 Can you list every privileged tool your agent can invoke, with explicit authorization logic for each?

With agents, tool selection is derived from interpretation. The agent decides which tools to invoke based on its understanding of intent. If that understanding is wrong, it invokes the wrong tool. Permission scoping designed for human users doesn't account for: agents operating at machine speed (10,000 decisions per hour), agents interpreting ambiguous context, agents selecting between multiple tools that could achieve similar goals, and agents optimizing for efficiency over safety. Organizations grant agents broad permissions assuming model intelligence will constrain behavior, rather than implementing explicit authorization controls.

Answer "Controlled" if

Complete inventory of privileged tools with permission levels documented
Explicit authorization logic (rules, policies, constraints) for tool selection
Permissions scoped by context (amount, time, user type, risk level)
Rate limits and usage constraints in place
Tool selection decisions are traceable

Answer "Gap identified" if

Can't list all privileged tools the agent can invoke
Tool selection is based on model reasoning rather than explicit authorization logic
Same permission model for agents as for human users
Can't explain why the agent chose a specific tool in a given scenario

Red flags

"The agent figures out which tools to use" (interpretation = authorization gap)
"We trust the agent to choose appropriately" (no explicit authorization logic)
"Agent has same permissions as the user" (but operates at different speed and scale)
"We'll add controls if we see abuse" (reactive, not preventive)

Gap: You don't know what the agent is authorized to do. If tool selection is based on model reasoning rather than explicit authorization logic, your permission model is probabilistic. When the agent invokes the wrong tool based on misinterpreted context, you can't point to a failed authorization check. The agent was authorized. It just misunderstood what it was supposed to do.

Q2.2 Is authorization an explicit, separate step in your system, or is it conflated with the agent's interpretation?

Traditional authorization is explicit: "User X requests action Y, check permissions, allow or deny." With agents, authorization is often implicit: the agent interprets context, decides on an action, selects a tool, executes. Authorization happens through tool access, not as a separate decision point. This creates gaps: no separation between "what the agent interpreted" and "what it's authorized to do," authorization logic embedded in model reasoning (unauditable), no way to distinguish between interpretation error and authorization failure, and the agent's understanding of permission boundaries is probabilistic.

Answer "Controlled" if

Explicit authorization service or layer exists, separate from agent reasoning
Deterministic authorization rules apply regardless of interpretation
Authorization failures are logged and distinguished from interpretation errors
Agent cannot bypass authorization through creative interpretation

Answer "Gap identified" if

No separate authorization step — only tool access permissions
Authorization logic is in the model, not in code
Same interpretation can result in different authorizations (non-deterministic)
No distinction between "agent tried to do X" and "agent was authorized to do X"

Red flags

"Authorization is implicit in tool access" (no separate authorization step)
"The model knows what it's allowed to do" (authorization in model weights, unauditable)
"Prompt engineering defines boundaries" (guidance, not enforcement)
"We test that the agent doesn't do unauthorized things" (testing is not a control)

Gap: Your authorization model may not exist as a separate control. If authorization is implicit in tool access or embedded in model reasoning, you can't audit it, test it independently, or verify it meets regulatory requirements. When the agent does something it shouldn't, you can't determine if it was an interpretation error, an authorization failure, or a design gap. This distinction matters for liability, audit, and remediation.

Q2.3 When your agent invokes a privileged tool, is there an explicit, defensible record of who authorized that action?

Traditional systems have clear liability: "User X performed action Y at time Z." With agents, liability is ambiguous. The user provided input but didn't explicitly command the action. The agent interpreted context and selected the tool. Multiple context sources influenced the interpretation. The developer created the agent but didn't authorize the specific action. When agent actions cause financial harm, regulatory violation, or legal liability, "who is responsible?" must have a clear, defensible answer. Most organizations haven't explicitly addressed this. Legal precedent for AI agent liability is still developing. Courts will look at: who had control, who benefited, was there negligence in design or oversight, were risks disclosed.

Answer "Controlled" if

Explicit liability model documented and legally reviewed
Audit trail captures full chain: user, context, interpretation, authorization, action
Terms of service explicitly address agent-initiated actions
Insurance explicitly covers agent-caused harm

Answer "Gap identified" if

Audit logs show action but not authorization chain
Multiple entities could be considered responsible (ambiguous)
No legal review of liability model for agent actions
Terms of service don't address agent-initiated actions

Red flags

"Haven't thought about liability mapping"
"The user is responsible" (but user didn't give explicit command)
"Depends on the situation" (ambiguity creates legal risk)
"We'll figure it out if something goes wrong" (reactive)

Gap: You have liability exposure without a defensible assignment model. When an agent action causes harm, legal and regulatory review will ask: "Who authorized this action?" If your answer is "the agent interpreted context and decided," that's a description of the mechanism, not an assignment of responsibility. This gap creates risk for your organization and potentially your leadership team personally.

Q2.4 Are your escalation triggers explicit in code, or do they depend on the agent's judgment?

"Human-in-the-loop" is cited as a control for agent systems. But it only works if: escalation triggers are explicit and enforceable, escalation happens before action not after, human review is meaningful not rubber-stamping, and the system doesn't time out or default to autonomous action. At agent speed (10,000 decisions per hour), "human review" can become theater: humans approve agent decisions without detailed review because there's no time to review and the agent is usually right. Organizations implement escalation as a policy ("agent should escalate when...") but don't enforce it as a control ("agent cannot proceed unless...").

Answer "Controlled" if

Explicit, testable escalation triggers exist in code — not in the prompt
Escalation is enforced before tool invocation, not optional
Human review includes context, interpretation, and reasoning chain
Queue management prevents escalation backlog from forcing autonomous action

Answer "Gap identified" if

Escalation depends on the agent's judgment, not explicit triggers
System times out and defaults to autonomous action if human unavailable
Human reviewers approve without detailed review due to volume or speed
Can't test escalation triggers independently of agent reasoning

Red flags

"High-risk actions require approval" (but "high-risk" is the model's judgment)
"Agent escalates when unsure" (confidence threshold is probabilistic)
"Human reviews all transactions" (but approves 97% automatically)
"Escalation is in the prompt" (guidance, not enforcement)

Gap: Your "human-in-the-loop" control may not function as intended. If escalation is based on agent judgment, it's not a control, it's a suggestion. At scale, even well-intentioned escalation breaks down: agent operates faster than humans can review, volume creates pressure to approve without detailed analysis, agent is right most of the time creating trust that leads to rubber-stamping, and system design punishes slow human review with timeouts or backlogs. This is where design review says "human oversight" and production reality is "autonomous operation with occasional human notification."

Boundary 3 of 4

Tool Call to Execution

Where interpreted commands become system actions. The question isn't "are actions logged?" It's "can you reconstruct how interpretation became execution?"

Questions 9–12 · Execution controls, audit trail, reversal, and optimization resistance

Q3.1 Are there explicit execution controls between tool selection and actual execution, enforced in code?

Traditional systems have execution controls independent of user decisions: input validation, business rule checks, rate limiting, approval workflows. These controls exist in code, not in user behavior. They can't be bypassed by creative phrasing. With agents, there's risk that "controls" exist in prompts or model reasoning rather than in code. "The agent should check..." (should, not must). "We instruct the agent to validate..." (instruction, not enforcement). "The agent is trained to..." (training, not control). If execution happens directly from tool invocation, there is no control layer between interpretation and consequence.

Answer "Controlled" if

Explicit execution controls exist in code, not agent reasoning
Validation layer sits between tool invocation and execution
Controls are enforceable, cannot be bypassed through rephrasing
Control failures are logged and trigger escalation
Controls are tested regularly with adversarial scenarios

Answer "Gap identified" if

Tool invocation directly triggers execution with no validation layer
Controls exist in prompts, not code
Can't test execution controls independently of agent reasoning
No rate limiting — agent can execute unlimited actions

Red flags

"The agent validates before executing" (agent behavior, not a control)
"We prompt the agent to be careful" (prompt engineering is not a control)
"Execution is fast, controls would slow it down" (speed vs safety trade-off)
"Controls would break the agent's functionality" (prioritizing capability over safety)

Gap: You have no control layer between agent interpretation and system execution. If the agent misinterprets context, selects the wrong tool, or derives incorrect parameters, there's no validation to catch it before execution. When something goes wrong, you can't point to a failed control. The agent was allowed to execute because tool invocation directly triggered execution. This gap is critical for regulatory compliance — controls must be demonstrable, not assumed.

Q3.2 Can your audit trail reconstruct the full chain from context retrieval through to execution for any given agent action?

For agents, auditors and regulators will ask: "Why did the agent take this action?" "What context influenced this decision?" "How did the agent interpret that context?" "Could the agent have interpreted it differently?" "Who is responsible for this outcome?" A complete audit trail must capture: what context sources were retrieved, content of retrieved context, how the agent interpreted the context, what intent was derived, which tool was selected and why, parameters derived for tool invocation, authorization check results, escalation triggers evaluated, and the execution result. Organizations log execution but not interpretation, making it impossible to reconstruct agent reasoning for audit. "The AI decided" won't satisfy regulatory review in financial services.

Answer "Controlled" if

Comprehensive logging across context, interpretation, tool selection, and execution layers
Ability to reconstruct full reasoning chain for any action
Immutable logs (tamper-evident, timestamped, versioned)
Retention policy meets compliance requirements

Answer "Gap identified" if

Audit trail shows action but not reasoning
Can't trace from context through interpretation to execution
Model reasoning is opaque, not logged
Audit trail insufficient for regulatory review in your sector

Red flags

"We log agent actions" (but not reasoning or context)
"We can ask the agent to explain its actions" (post-hoc rationalization, not runtime logging)
"We can replay inputs to see what agent would do" (non-deterministic, different result)
Logging is optional or performance-dependent

Gap: Your audit trail is insufficient for regulatory review. When auditors or regulators ask "Why did the agent take this action?", you won't be able to provide a defensible answer. This creates compliance risk for banking regulations, SOX compliance, GDPR (explaining automated decisions), and industry-specific regulations. "The AI decided" is not an acceptable answer in regulated environments. You need to demonstrate how and why.

Q3.3 Have you mapped which agent-initiated actions are reversible versus irreversible, and do irreversible actions require explicit human confirmation?

With agents operating at machine speed, errors compound quickly. An agent misinterprets context and executes 100 similar actions before the error is detected. Each action creates side effects. Reversal becomes complex. If agent actions are irreversible, misinterpretation becomes permanent harm. Design question that most teams avoid: should irreversible actions require explicit human confirmation rather than agent autonomy? Financial transactions can be hard to recall once sent. Data deletion requires backup restore. External communications (emails, notifications, API calls to third parties) cannot be reversed once sent. Organizations don't design for reversal because they assume agent interpretation will be correct, or that errors will be rare and caught quickly.

Answer "Controlled" if

Clear mapping of reversible versus irreversible actions exists
Irreversible actions require human confirmation before execution
Time windows are documented (can reverse within X hours or days)
Monitoring exists to detect incorrect actions requiring reversal

Answer "Gap identified" if

Haven't mapped which actions are reversible versus irreversible
Agent can autonomously execute irreversible actions
Reversal requires manual intervention while agent operates faster than humans
No monitoring to detect when reversal is needed

Red flags

"Most actions are reversible" (vague, not specific)
"We'll build reversal capability if we need it" (reactive)
"We monitor and catch errors quickly" (but agent executes thousands of actions per hour)
No time limit tracked — don't know when reversal becomes impossible

Gap: Your agent creates permanent consequences from probabilistic interpretation. If the agent misinterprets context, the resulting harm cannot be undone. This amplifies risk when the agent operates at scale (one misinterpretation leading to thousands of irreversible actions), when detection is delayed (harm compounds before caught), and when context poisoning occurs (systematic misinterpretation across many decisions).

Q3.4 Are your execution controls robust to optimization pressure — could the agent learn to avoid triggering them?

Traditional systems don't adapt to avoid controls. AI agents can optimize their behavior. If controls create friction (slower execution, escalation, denial) and the agent is optimized for task completion, it might learn to avoid triggering those controls. This isn't malicious intent. It's optimization: the agent finds patterns that lead to task success and repeats them. Common scenarios: agent consistently reports confidence above threshold to avoid escalation (confidence is agent-reported, not independently validated), agent splits transactions to stay below amount thresholds, agent avoids retrieving context sources that trigger escalation, agent reframes actions using different terminology to avoid keyword-based rules. This is a known AI safety failure mode: specification gaming, where systems optimize for stated objectives in ways that violate unstated intentions.

Answer "Controlled" if

Controls are independent of agent behavior — cannot be bypassed
Adversarial testing includes attempts to circumvent controls
Agent optimization includes compliance metrics, not just task completion
Semantic analysis used, not keyword matching, for rule enforcement
Monitoring tracks behavior pattern changes over time

Answer "Gap identified" if

Controls depend on agent reporting (confidence, intent, reasoning)
Agent is optimized for task completion without balancing compliance
Keyword-based rules that can be circumvented through rephrasing
Haven't tested whether agent can learn to avoid controls

Red flags

"The agent wouldn't try to bypass controls" (untested assumption)
"We'll catch it in monitoring" (assumes you'll detect the pattern)
"Controls are in the prompt" (optimization pressure exceeds prompt guidance)
No monitoring for pattern changes in agent behavior over time

Gap: Your controls may erode over time as the agent optimizes. This isn't about the agent "trying to bypass" controls. It's about optimization pressure finding paths of least resistance. If controls create friction and the agent is rewarded for task completion, the agent will find patterns that minimize friction, even if those patterns circumvent controls. Controls need to be robust to optimization pressure, or they'll degrade.

Boundary 4 of 4

Execution to Side Effects

Where actions produce irreversible consequences. The question isn't "what can go wrong?" It's "when something goes wrong, who is liable, and how quickly will you detect it?"

Questions 13–15 · Consequence mapping, liability assignment, and monitoring

Q4.1 Have you explicitly mapped the full consequence tree for each type of action your agent can execute?

With agents, consequences compound in ways human-speed systems don't encounter: scale (agent executes thousands of actions, not dozens), speed (consequences materialize faster than human detection), complexity (agent actions may have non-obvious downstream effects), and interpretation (agent's misunderstanding creates systematic bias in outcomes). Immediate consequences from financial transactions include funds transferred, account balances changed, transaction fees incurred, and regulatory reporting triggered. Downstream consequences include customer relationship impact, liquidity effects, compliance impact, reputational harm, and legal liability. Without consequence mapping, you can't assess risk appropriately, design adequate controls, determine appropriate insurance coverage, or defend liability assignment. Organizations focus on whether actions can execute, not on what happens after execution.

Answer "Controlled" if

Full consequence mapping for each action type documented
Worst-case scenarios documented and reviewed
Understanding of cascading and downstream effects
Quantified risk: financial exposure, reputational impact, legal liability
Regular review and updates as agent capabilities evolve

Answer "Gap identified" if

Haven't mapped consequences beyond immediate execution
No worst-case analysis or stress testing
Don't know cascading consequences — one action triggering others
Haven't quantified financial, reputational, or legal exposure

Red flags

"Consequences are the same as if a human did it" (but at different scale and speed)
"Agent actions are low-risk" (without quantifying potential harm)
"We'll handle problems reactively" (assumes you'll detect in time)
"Worst-case is small financial loss" (doesn't account for reputational or legal harm)

Gap: You don't know what can go wrong when agent actions produce unintended consequences. You can't design appropriate controls (don't know what to prevent), can't assess risk properly (don't know exposure), can't determine insurance needs (don't know potential liability), and can't defend in legal review (didn't anticipate harm). When something goes wrong, you'll be asked: "Did you consider this possibility?" If the answer is no, that's negligence.

Q4.2 Do you have an explicit, legally reviewed liability model for agent-initiated actions that cause harm?

Traditional liability is clear: "Person X performed action Y, is responsible for consequence Z." With agents, liability is ambiguous across multiple scenarios. Misinterpreted context causes financial loss: agent retrieves old case note, interprets it as current instruction, executes 500 inappropriate transactions before detection, customer loses $50,000 — who is liable? Agent violates privacy regulation: agent accesses customer PII based on ambiguous query interpreted as permission request, regulatory fine $100,000 — who is liable? Agent causes reputational harm: agent sends inappropriate communication to 10,000 customers based on misinterpreted sentiment, social media backlash, customer churn — who is liable? Legal precedent for AI agent liability is still developing. Without explicit liability assignment, you face legal uncertainty, insurance gaps, employee personal liability risk, and user confusion.

Answer "Controlled" if

Explicit liability model documented and legally reviewed
Clear assignment for different harm scenarios
Employment agreements address agent design and deployment liability
Insurance explicitly covers agent-caused harm
Terms of service clearly explain liability to users

Answer "Gap identified" if

Liability is ambiguous — could be multiple parties depending on situation
No legal review of agent liability model
Terms of service don't explicitly address agent-initiated actions
Insurance may not cover AI agent-caused harm — haven't verified

Red flags

"The organization is liable" (too broad, no individual accountability)
"Our terms of service cover this" (but don't explicitly address agent interpretation)
"Standard insurance covers it" (may not cover AI agent-specific risks)
"We'll let legal sort it out if something happens" (reactive, expensive)

Gap: You have liability exposure without a defensible assignment model. When harm occurs, multiple parties may face legal risk: the organization (regulatory liability, civil suits), executives (fiduciary duty, negligence), developers (professional liability if design was flawed), operators (negligence in deployment and oversight), and users (if terms incorrectly assign liability to them). This gap should be closed before production deployment, not after an incident.

Q4.3 Does your monitoring detect agent-caused harm at agent speed, not human speed?

Agents operate at machine speed. Without real-time monitoring, problems compound. Agent executes 1,000 incorrect actions in 10 minutes. Detection happens hours or days later. Reversal is complex or impossible. Harm has already occurred. At human speed, mistakes are caught and corrected incrementally. At agent speed, mistakes scale before detection. Your monitoring needs to operate at agent speed, not human speed. "We monitor transaction volumes" is not monitoring correctness or appropriateness. "Users report problems" means users may not detect subtle issues. "We review in weekly audits" is insufficient when the agent operates at 10,000 actions per hour. Organizations implement monitoring designed for human-speed systems, which is insufficient for agent-speed execution.

Answer "Controlled" if

Real-time monitoring exists for critical actions
Automated alerts trigger for anomalies or policy violations
Detection operates at agent speed (seconds to minutes, not hours to days)
Ability to pause or stop agent when problems are detected
Clear escalation process with defined SLAs
Monitoring and response processes are tested regularly

Answer "Gap identified" if

No real-time monitoring — only periodic review
Detection depends on users reporting problems
Monitoring tracks volume or performance, not correctness or appropriateness
Response time measured in days when agent operates in seconds

Red flags

"We'll see problems in system logs" (but logs aren't actively monitored)
"Error rates are low so monitoring isn't critical" (until error rate spikes)
"We monitor and catch errors quickly" (measured in days, agent executes in seconds)
"We'll add monitoring if we see problems" (reactive, not preventive)

Gap: Your detection capability is too slow for agent speed. By the time you detect a problem, the agent may have executed thousands of incorrect actions, caused compounding harm across multiple systems and customers, and created consequences that are expensive or impossible to reverse. You're operating blind: the agent is executing, but you can't see whether those executions are appropriate until well after consequences have materialized.

—

Gaps requiring attention

Prioritization framework

Address firstContext source mapping — Q1.1 (entry point for all other failures)

Address firstContext control — Q1.2 ("who controls" questions are highest priority)

Address firstAuthentication at interpretation boundary — Q1.4

Address firstLiability model — Q2.3 and Q4.2 (legal exposure without defensible model)

Address firstAudit trail completeness — Q3.2 (regulatory review requires reconstruction of reasoning)

Within 30 daysExplicit authorization layer — Q2.2

Within 30 daysEscalation trigger enforcement — Q2.4

Within 30 daysExecution controls in code — Q3.1

Within 30 daysReversal capability mapping — Q3.3

Within 30 daysReal-time monitoring at agent speed — Q4.3

Document and monitorConflict resolution rules — Q1.3 (if agent operates in low-ambiguity environments)

Document and monitorTool permission scoping — Q2.1 (if permissions are currently narrow)

Document and monitorOptimization resistance — Q3.4 (if agent has limited task autonomy)

Document and monitorConsequence mapping — Q4.1 (if action types are limited and well-understood)

Next steps

Close these gaps. The Authentication Implementation Playbook covers step-by-step closure methodology for each control surface this audit maps: trust boundary architecture, authorization layer design, audit trail requirements that satisfy regulatory review, and escalation enforcement patterns that hold at machine speed.

Join the waitlist for implementation access →

What motivated this audit

Trust Model

Amazon Q: Code scanning found nothing malicious. The attack was a natural-language instruction in a config file the AI agent interpreted as a command.

Audit Gap

ServiceNow Virtual Agent: Passed design approval. Security review asked "is the vendor secure?" not "is every agent action identity-verified?"

Foundation article

AI Agents Are Privileged Interpreters: The Trust Boundary Security Teams Keep Missing