[TECHNICAL DEEP-DIVES]
[
2/8/26
]
The 18 Ways Someone Can Bypass Your Agent Governance Layer
[Author]:
Amjad Fatmi
Most teams building AI agents believe they have a governance layer. A guardrail here, an allow list there, maybe an observability platform watching for anomalies. What they actually have is a collection of partial controls with well-documented gaps between them.
This post covers the 18 attack classes formally analyzed in Faramesh's published threat model. Each one is real. Each one has a concrete mechanism. And together they explain why most agent governance approaches fail under adversarial conditions, and what a system needs to actually resist them.
How to read this
Each attack class follows the same structure: the attacker's goal, how the attack works in practice, and what properties a system needs to mitigate it. The attacks map directly to the OWASP Top 10 for Agentic Applications (2026), the OWASP LLM Top 10 (2025), and MITRE ATLAS.
Execution Bypass
A1: Authorization Bypass: Direct Tool Execution
Goal. Execute an effectful tool call without passing the authorization boundary at all.
How it works. Most agent frameworks expose multiple code paths to tool execution. The production path goes through the governance middleware. Debug endpoints, legacy helpers, background workers, and alternate SDK calls often do not. An attacker who understands the framework finds the ungated path and uses it directly.
What mitigation requires. Non-bypassability is an enforcement placement property, not a policy property. Every code path that can produce side effects must require permit verification before execution. Missing one path creates a complete bypass. The enforcement surface must be exhaustive — not comprehensive, exhaustive.
OWASP Agentic: ASI02, ASI05 | OWASP LLM: LLM06 | MITRE ATLAS: Execution
A12: Confused Deputy: Indirect Escalation Through an Allowed Tool
Goal. Use an explicitly permitted tool (Tool A) to trigger a denied effect (Tool B) indirectly.
How it works. Policy permits Tool A for a "safe" operation. The agent crafts inputs to Tool A such that Tool A triggers Tool B with Tool A's ambient permissions. The governance layer saw Tool A, evaluated it, and approved it. Tool B never hit the boundary.
What mitigation requires. Permits must not implicitly compose across tools. Any tool that can itself invoke effectful operations must have its own authorization boundary. The enforcement requirement is complete mediation of effectful outcomes — not just the first hop.
OWASP Agentic: ASI02, ASI03 | OWASP LLM: LLM06
A18: Executor Partial Coverage: One Ungated Path
Goal. Find a single execution path that performs side effects without permit verification.
How it works. Non-bypassability requires that every member of the effectful interface set enforces verification. In practice, systems grow over time. A new developer adds a feature. An async worker is added for performance. A plugin integrates a new tool. Each addition is a potential ungated path. One is enough.
What mitigation requires. Coverage discipline. The effectful interface set must be explicitly defined and maintained. Every addition to it must be gated. This is an operational requirement, not just a design one — it requires ongoing enforcement at the development process level.
OWASP Agentic: ASI05, ASI02 | OWASP LLM: LLM06
Policy and Artifact Attacks
A2: Policy Downgrade - Policy Version Confusion
Goal. Force evaluation under a weaker, older policy version.
How it works. Policy updates happen continuously in production systems. Between the moment a policy is updated and the moment all agents are evaluating against the new version, there is a window. An attacker races updates to land on an older snapshot, or replays a permit minted under a permissive older policy.
What mitigation requires. Decisions and permits must bind the exact policy hash (not version label, not "latest" tag, the hash of the actual policy bytes). Executors for high-risk action classes must verify that the permit's bound policy hash matches the current policy. Without this, a policy update is not an enforcement event, it's a suggestion.
OWASP Agentic: ASI03, ASI04 | OWASP LLM: LLM03
A4: Permit Forgery: Reuse for a Different Action
Goal. Reuse a valid permit to authorize a different action than it was issued for.
How it works. The attacker captures a legitimate permit issued for action A. They then attempt to use that permit to authorize action A′, a different action with different parameters or a different target. If permits are not cryptographically bound to the exact canonical action hash, this works.
What mitigation requires. Permits must bind to the canonical hash of the exact action they authorize. Verification must check that the presented permit's action hash matches the hash of the action being executed. Not "does a permit exist", "does this permit match this exact action."
OWASP Agentic: ASI03 | OWASP LLM: LLM06
A5: Permit Replay: Duplicate Execution Within Validity Window
Goal. Execute the same authorized action multiple times by replaying a valid permit before it expires.
How it works. A permit is issued for a legitimate action. The attacker captures it. Before it expires, they replay it, submitting the same execution request multiple times. For non-idempotent operations (payments, emails, database writes), each replay produces a real effect.
What mitigation requires. For non-idempotent actions, permits must be single-use with a consumed predicate. Once a permit has been used to start execution, it must be marked consumed and rejected on any subsequent presentation. The consumed state must be checked at execution time, not just at permit issuance.
OWASP Agentic: ASI02, ASI03 | OWASP LLM: LLM06
A14: Decision-Cache Poisoning
Goal. Return an incorrect cached decision, PERMIT instead of DENY, for a dangerous action.
How it works. Authorization systems sometimes cache decisions for performance. If the cache key is incomplete, missing a parameter, a policy version, or an environment dimension, an attacker can induce a PERMIT for a benign action and then craft a different action that hits the same cache entry.
What mitigation requires. Decision caches must be keyed by the full evaluation identity including the action hash (which encodes all parameters), the policy hash, and the environment. A cache hit on an incomplete key is a security vulnerability, not a performance optimization.
OWASP Agentic: ASI04, ASI08 | OWASP LLM: LLM05
A15: Policy Rollback: Stale Cache Exploitation
Goal. Force evaluation under a permissive old policy by exploiting stale policy distribution.
How it works. Policies use mutable tags ("latest", "production", "v2") instead of immutable digests. An attacker exploits the window between policy update and cache invalidation, or directly manipulates the policy distribution system to serve an older version.
What mitigation requires. Policies must be referenced by their cryptographic hash, not by mutable tags or version labels. The policy hash must be bound into every decision and permit. Stale cache attacks become detectable because the hash in the permit won't match the current policy hash.
OWASP Agentic: ASI04, ASI03 | OWASP LLM: LLM03
A17: Permit Substitution Across Environments
Goal. Use a permit minted under dev or staging constraints to execute in production.
How it works. Dev environments have permissive policies. Production environments have strict ones. If permits are not environment-scoped, a permit obtained in dev is valid in production.
What mitigation requires. Permits must carry an environment binding. Production executors must reject permits not explicitly minted for the production environment. This is a mandatory field, not an optional one.
OWASP Agentic: ASI03 | OWASP LLM: LLM06
Evidence and Audit Attacks
A3: Audit Tampering: Deleting or Reordering Decisions
Goal. Remove or alter the decision trail to hide unauthorized actions or corrupt forensic investigation.
How it works. If the audit log is a mutable database table, an attacker who compromises the system can delete records, flip DENY to PERMIT, or reorder decisions. A forensic investigation then produces wrong conclusions, or no conclusions at all.
What mitigation requires. Hash-chained records. Each Decision Provenance Record contains the hash of its own content and the hash of the previous record. Modifying any record breaks the chain. Deleting a record breaks the chain. Reordering records breaks the chain. The break is detectable, and detecting it is itself forensic evidence that tampering occurred.
OWASP Agentic: ASI03, ASI04 | OWASP LLM: LLM03 | MITRE ATLAS: Defense Evasion
A16: Log Truncation: Chain Head Reset
Goal. Truncate the decision log wholesale and reset the chain, erasing history without leaving detectable breaks inside the retained chain.
How it works. Hash chaining detects tampering within a retained chain. It does not, by itself, detect wholesale truncation. An attacker deletes all records from a certain point forward, resets the chain head, and the remaining records form a valid chain. There is nothing inside the retained records to indicate that anything was deleted.
What mitigation requires. External anchoring. Periodic chain checkpoints must be committed out-of-band, to a separate storage domain, a transparency log, or a quorum. The anchor proves the chain existed to that point. If the chain is later shorter than the anchor, truncation is detectable.
OWASP Agentic: ASI03, ASI04 | OWASP LLM: LLM03
State and Context Attacks
A7: State Confusion: Evaluating Under Wrong Context
Goal. Cause authorization evaluation to run under incomplete or incorrect state, producing a decision that would not have been made under true conditions.
How it works. Policy evaluation often depends on state beyond the action itself, account status, user tier, risk context, active limits. If that state can be withheld, raced, or ambiguously represented, the evaluation produces a decision based on incorrect assumptions.
What mitigation requires. The evaluation state snapshot must be captured deterministically at the moment of evaluation and bound into the decision record via state hash. Fail-closed semantics apply: if required state cannot be resolved, the decision is DENY. The bound state hash makes the assumed context explicit and replayable.
OWASP Agentic: ASI06 | OWASP LLM: LLM08, LLM04
A8: TOCTOU: State Changes Between Authorization and Execution
Goal. Obtain a PERMIT under state conditions that satisfy policy constraints, then cause state to change before execution so the executed action operates under conditions that would have been denied.
How it works. Authorization and execution are two separate events in time. State can change between them. The permit was valid when issued. By the time execution runs, the conditions that justified the permit no longer hold.
What mitigation requires. Short-lived permits for high-risk actions, with expiry windows tight enough that state drift is bounded. For the highest-risk actions, freshness checks at execution time against the state referenced in the permit. Eliminating TOCTOU entirely requires transactional semantics across all state, which is outside the scope of any authorization layer. The claim is bounded exposure, not elimination.
OWASP Agentic: ASI02, ASI08 | OWASP LLM: LLM06
A9: Canonicalization Drift: Representation Manipulation
Goal. Express a forbidden action through an alternate representation that produces a different hash and bypasses policy.
How it works. Policy is evaluated over the canonical form of an action. If canonicalization is incomplete, if two semantically identical actions can produce different canonical representations, an attacker can craft the representation that doesn't match the deny rule.
What mitigation requires. The Canonical Action Representation must collapse all representational variants of a semantically equivalent action into a single canonical form, trimmed strings, lexicographically sorted keys at all levels, normalized aliases, explicit defaults for missing fields. Policy predicates then operate on a stable domain where semantic equivalence implies hash equivalence.
OWASP Agentic: ASI02, ASI01 | OWASP LLM: LLM01, LLM05
A13: Semantic Smuggling: Hidden Forbidden Semantics
Goal. Hide disallowed intent inside superficially allowed action parameters, bypassing policy predicates that don't inspect the right fields.
How it works. The action name and top-level operation look permitted. The dangerous semantics are encoded inside parameters that policy doesn't examine.
What mitigation requires. Canonicalization must include all execution-relevant fields in the canonical domain. Policy predicates must be able to observe every parameter that affects the action's side effects. Parameters that change what the action actually does must be visible to policy, not buried in opaque blobs.
OWASP Agentic: ASI01, ASI02 | OWASP LLM: LLM01, LLM05
Infrastructure Attacks
A6: Approval Spoofing: DEFER Resolution Corruption
Goal. Convert a deferred action (pending human approval) into an approved execution by forging or manipulating the approval signal.
How it works. A policy requires human approval for a high-risk action. The action enters a pending state. The attacker spoofs the approval signal, forging an approval event, racing the approval endpoint, or manipulating the approval identity check, and triggers execution without legitimate human authorization.
The critical misconception: approval must not directly permit execution. If approval is treated as a direct PERMIT, spoofing approval is sufficient to cause execution.
What mitigation requires. Approval updates state and triggers deterministic re-evaluation. Not: approval → PERMIT. Rather: approval → state update → re-evaluation → if new state satisfies policy → PERMIT. Spoofing an approval signal only corrupts the state input to re-evaluation. It does not directly produce a permit. The evaluation itself must still pass.
OWASP Agentic: ASI09, ASI03 | OWASP LLM: LLM01 | MITRE ATLAS: AML.T0051.000
A10: Cross-Tenant Artifact Reuse
Goal. Reuse a permit or decision artifact from one tenant's environment to authorize actions in another tenant's environment.
How it works. In multi-tenant deployments, a permit minted for tenant A is presented to an executor operating for tenant B. If permits are not tenant-scoped and if executor verification doesn't check tenant identity, the permit is accepted.
What mitigation requires. Permits, DPR records, policies, and all artifacts must be explicitly tenant-scoped. Executor verification must check tenant identity as a precondition, not an optional field. Policies and audit chains must be isolated per tenant/domain. A DPR from tenant A must not be valid evidence in tenant B's chain.
OWASP Agentic: ASI03 | OWASP LLM: LLM02
A11: Gate DoS: Overload and Availability Attack
Goal. Degrade or eliminate the authorization layer's availability so that fail-open behavior grants unintended execution.
How it works. If the governance system is unavailable and the default behavior is to allow, an attacker only needs to take down the governance layer to bypass all policy. Overwhelm the gate with requests, induce timeouts, or exploit a resource exhaustion vulnerability. If the default is fail-open, the system is now ungoverned.
What mitigation requires. Fail-closed semantics are non-negotiable. If the governance layer cannot evaluate an action, for any reason including timeout, network failure, internal error, or overload. The action must not execute. A DoS against the governance layer must degrade availability, not security. Availability is an operational problem. Fail-open is a security problem.
OWASP Agentic: ASI08 | OWASP LLM: LLM10
The pattern across all 18
Every one of these attacks points at the same set of structural properties that a governance layer must have to resist them:
None of these attacks require exotic capabilities. They require an understanding of how governance layers are typically built and where the gaps are. The attacks that are hardest to defend against are not hard because the defense is technically complex, they are hard because the defense requires an architectural commitment most teams have not made.
The commitment is this: every effectful action, evaluated canonically, at a single mandatory boundary, with fail-closed defaults, tamper-evident records, and permits that bind to the exact thing they authorize. That set of properties is not optional and it is not divisible. Implementing eight of the twelve and leaving four as "we'll get to it" means the four you left are the four an attacker will find.
