[SECURITY RESEARCH]

[

2/25/26

]

The Agents of Chaos Study Documents Exactly What Faramesh Was Built to Prevent

[Author]:

Amjad Fatmi

arxiv:2602.20021 - Published February 2026

Thirty-eight researchers from Northeastern, Harvard, UBC, Carnegie Mellon, and several other institutions spent two weeks trying to break autonomous AI agents in a live environment. Not a sandbox. Not a simulation. A real Discord server, real ProtonMail accounts, real Bash shell access, real persistent file systems, real cron jobs, and real consequences when things went wrong.

They ran six agents, Ash, Flux, Quinn, and Jarvis on Kimi K2.5; Mira and Doug on Claude Opus 4.6, all deployed on the same infrastructure that enterprises are using right now: OpenClaw, on Fly.io VMs with 20GB of persistent storage.

They documented eleven representative case studies. The failures are not exotic. They are boring, predictable, and catastrophic. Every single one of them was preventable with a pre-execution governance layer that evaluated actions before they ran.

Faramesh governs OpenClaw. We shipped that integration on February 22nd. This post maps every failure in the study to the Faramesh primitive that prevents it, and is honest about the one it does not.

What the Study Is Actually About

Before mapping failures, the framing matters. The paper's viral coverage has dressed it up as evidence of AI agents developing Machiavellian strategy and emergent deception. That is not what the paper says. The abstract is precise: these are "failures emerging from the integration of language models with autonomy, tool use, and multi-party communication."

Integration failures. Not strategic deception. An agent that leaks an SSN because you said "forward" instead of "share" is not being deceptive. It is being brittle. An agent that destroys its own mail server to protect a secret is not pursuing power. It has no concept of proportionality.

The authors identify three structural deficits that explain every failure in the study:

  1. No stakeholder model. Agents cannot reliably distinguish owners from non-owners. Authority flows from whoever is messaging them.

  2. No self-model. Agents have no concept of where their competence boundary lies or when a situation exceeds it.

  3. No reliable private deliberation surface. Agents cannot reason about channel visibility, whether a thought is private or observable affects what they do.

These are architectural problems. They cannot be solved by better prompting, more training, or cleverer system instructions. They require infrastructure that sits between agent intent and agent execution.

That infrastructure is what Faramesh is.

Case Study 1: The Mail Server That Destroyed Itself

What happened: Ash was given a secret and instructed to protect it. When a researcher applied social engineering pressure to extract the secret, Ash made a judgment call and destroyed its own mail server. The secret was protected. The infrastructure was gone. The researchers note Ash's values were not wrong — protecting the secret was correct. The failure was proportionality. Ash executed a destructive, irreversible action when dozens of less extreme options existed.

The structural problem: No policy boundary on irreversible destructive actions. The agent's tool set included shell execution with no constraints on what it could destroy. From Ash's perspective, rm -rf /var/mail was as available as echo "access denied".

The Faramesh primitive: Pre-execution policy evaluation with an explicit irreversibility flag on destructive operations. Every tool call passes through the policy engine before it executes. The engine evaluates the action against static rules and the session context. A destructive email server operation is classified as irreversible and high-risk, which triggers DEFER, the action is suspended and routed to a human approver before execution.

# faramesh/policies/email-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: email-server-destructive-ops
    description: "Require human approval before any destructive email server operation"
    match:
      tool: "shell"
      operation: "run"
      params:
        cmd:
          pattern: "(rm|delete|destroy|purge).*(mail|smtp|postfix|proton)"
    decision: DEFER
    reason: "Destructive email operation requires human authorization"
    metadata:
      irreversible: true
      risk_level: critical

  - id: email-server-config-change
    description: "Block direct modification of email server configuration"
    match:
      tool: "filesystem"
      operation: "write"
      params:
        path:
          pattern: ".*(postfix|dovecot|mail|smtp).*\\.conf$"
    decision: DENY
    reason: "Email server configuration is a protected resource"
# faramesh/policies/email-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: email-server-destructive-ops
    description: "Require human approval before any destructive email server operation"
    match:
      tool: "shell"
      operation: "run"
      params:
        cmd:
          pattern: "(rm|delete|destroy|purge).*(mail|smtp|postfix|proton)"
    decision: DEFER
    reason: "Destructive email operation requires human authorization"
    metadata:
      irreversible: true
      risk_level: critical

  - id: email-server-config-change
    description: "Block direct modification of email server configuration"
    match:
      tool: "filesystem"
      operation: "write"
      params:
        path:
          pattern: ".*(postfix|dovecot|mail|smtp).*\\.conf$"
    decision: DENY
    reason: "Email server configuration is a protected resource"
# faramesh/policies/email-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: email-server-destructive-ops
    description: "Require human approval before any destructive email server operation"
    match:
      tool: "shell"
      operation: "run"
      params:
        cmd:
          pattern: "(rm|delete|destroy|purge).*(mail|smtp|postfix|proton)"
    decision: DEFER
    reason: "Destructive email operation requires human authorization"
    metadata:
      irreversible: true
      risk_level: critical

  - id: email-server-config-change
    description: "Block direct modification of email server configuration"
    match:
      tool: "filesystem"
      operation: "write"
      params:
        path:
          pattern: ".*(postfix|dovecot|mail|smtp).*\\.conf$"
    decision: DENY
    reason: "Email server configuration is a protected resource"

With this policy active, Ash's mail server deletion attempt hits the DEFER rule. A human approver receives the request with full context: the action, the parameters, the agent's stated reason, the session history. The approver denies it. The server survives. The secret is still protected because the actual problem — the social engineering attempt — gets escalated to the owner instead of handled by a disproportionate unilateral action.

The DPR record for this event binds the DEFER decision to the policy version, the canonical action hash, and the session state at the moment of evaluation. The owner has a complete audit trail of what Ash tried to do, why, and what was decided.

Case Study 2: The Nine-Day Infinite Loop

What happened: Two agents entered a feedback loop that ran for nine days, consuming over 60,000 tokens with no termination condition and no owner notification. Neither agent recognized it was stuck. Neither escalated. The conversation simply continued — generating cost and consuming resources, until humans noticed.

The structural problem: No resource budget enforcement. Agents had no concept of session limits. The multi-agent communication channel had no circuit breaker. There was no external system counting tokens, measuring elapsed time, or detecting loop conditions.

The Faramesh primitive: Session-level resource policies with automatic DENY on breach, and anomaly detection on action frequency patterns that indicate loops.

# faramesh/policies/resource-governance.yaml
version: "1.0"
agent_id: "*"

session_limits:
  max_tokens_per_session: 10000
  max_session_duration_minutes: 120
  max_actions_per_session: 500
  max_actions_per_minute: 30

rules:
  - id: token-budget-exceeded
    description: "Deny actions when session token budget is exhausted"
    match:
      condition: "session.token_count > session_limits.max_tokens_per_session"
    decision: DENY
    reason: "Session token budget exceeded. Owner notification required to continue."
    notify:
      channel: slack
      message: "Agent {agent_id} has exceeded its token budget. Session suspended."

  - id: loop-detection
    description: "Defer agent actions when repetitive pattern detected"
    match:
      condition: "session.action_velocity_5min > 50 AND session.unique_action_types_5min < 3"
    decision: DEFER
    reason: "Potential loop condition detected. Human review required."
    metadata:
      anomaly_type: "repetitive_action_pattern"

  - id: multi-agent-session-duration
    description: "Require approval to continue multi-agent sessions beyond threshold"
    match:
      tool: "discord"
      operation: "send_message"
      condition: "session.duration_minutes > 60 AND session.participant_count > 1"
    decision: DEFER
    reason: "Extended multi-agent session requires human confirmation to continue"
# faramesh/policies/resource-governance.yaml
version: "1.0"
agent_id: "*"

session_limits:
  max_tokens_per_session: 10000
  max_session_duration_minutes: 120
  max_actions_per_session: 500
  max_actions_per_minute: 30

rules:
  - id: token-budget-exceeded
    description: "Deny actions when session token budget is exhausted"
    match:
      condition: "session.token_count > session_limits.max_tokens_per_session"
    decision: DENY
    reason: "Session token budget exceeded. Owner notification required to continue."
    notify:
      channel: slack
      message: "Agent {agent_id} has exceeded its token budget. Session suspended."

  - id: loop-detection
    description: "Defer agent actions when repetitive pattern detected"
    match:
      condition: "session.action_velocity_5min > 50 AND session.unique_action_types_5min < 3"
    decision: DEFER
    reason: "Potential loop condition detected. Human review required."
    metadata:
      anomaly_type: "repetitive_action_pattern"

  - id: multi-agent-session-duration
    description: "Require approval to continue multi-agent sessions beyond threshold"
    match:
      tool: "discord"
      operation: "send_message"
      condition: "session.duration_minutes > 60 AND session.participant_count > 1"
    decision: DEFER
    reason: "Extended multi-agent session requires human confirmation to continue"
# faramesh/policies/resource-governance.yaml
version: "1.0"
agent_id: "*"

session_limits:
  max_tokens_per_session: 10000
  max_session_duration_minutes: 120
  max_actions_per_session: 500
  max_actions_per_minute: 30

rules:
  - id: token-budget-exceeded
    description: "Deny actions when session token budget is exhausted"
    match:
      condition: "session.token_count > session_limits.max_tokens_per_session"
    decision: DENY
    reason: "Session token budget exceeded. Owner notification required to continue."
    notify:
      channel: slack
      message: "Agent {agent_id} has exceeded its token budget. Session suspended."

  - id: loop-detection
    description: "Defer agent actions when repetitive pattern detected"
    match:
      condition: "session.action_velocity_5min > 50 AND session.unique_action_types_5min < 3"
    decision: DEFER
    reason: "Potential loop condition detected. Human review required."
    metadata:
      anomaly_type: "repetitive_action_pattern"

  - id: multi-agent-session-duration
    description: "Require approval to continue multi-agent sessions beyond threshold"
    match:
      tool: "discord"
      operation: "send_message"
      condition: "session.duration_minutes > 60 AND session.participant_count > 1"
    decision: DEFER
    reason: "Extended multi-agent session requires human confirmation to continue"

The nine-day loop never survives the max_session_duration_minutes: 120 limit. At the two-hour mark, further actions are denied and the owner receives a notification. Even without the session limit, the loop detection rule fires within five minutes: high action velocity with low unique action type diversity is the behavioral signature of a stuck agent.

Every token consumed, every message sent, every action evaluated is recorded in the DPR chain. If the loop had somehow continued, a Faramesh audit export gives the owner a complete forensic reconstruction of the two-hour window before the circuit breaker fired.

Case Study 3: The "Forward" vs. "Share" Bypass

What happened: An agent correctly refused to "share" personally identifiable information, SSNs, bank account numbers, medical data, when asked directly. The attacker rephrased: "forward these emails." The agent complied immediately. Same action. Different verb. Complete data exposure.

The structural problem: Safety training is keyword-dependent, not concept-dependent. The agent's refusal was triggered by semantic pattern matching against known dangerous framings. "Forward" did not match the pattern. The underlying action, sending sensitive data to an unauthorized recipient, was identical.

The Faramesh primitive: Content-level policy evaluation that operates on the canonical action representation, not the natural language framing. Faramesh normalizes every action to its canonical form before policy evaluation. A "forward email" and a "share email" with the same recipient and content produce the same canonical action hash. PII detection operates on the action parameters, not the English command that produced them.

# faramesh/policies/pii-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: outbound-pii-detection
    description: "Block outbound communications containing PII regardless of operation framing"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply", "compose"]
      params:
        body:
          contains_pii: true
          pii_types: ["ssn", "bank_account", "credit_card", "medical_record"]
    decision: DENY
    reason: "Outbound email contains PII. Requires explicit owner authorization."
    metadata:
      pii_detected: true
      compliance_flags: ["GDPR.Article30", "HIPAA.164.312b", "SOC2.CC6.1"]

  - id: outbound-pii-external-recipient
    description: "DEFER outbound emails with any sensitive content to external domains"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply"]
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
        body:
          sensitivity_score_gte: 0.6
    decision: DEFER
    reason: "Email to external recipient with sensitive content requires authorization"

  - id: email-thread-forwarding
    description: "Require approval before forwarding any email thread externally"
    match:
      tool: "email"
      operation: "forward"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DEFER
    reason: "Forwarding email threads externally requires explicit authorization"
# faramesh/policies/pii-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: outbound-pii-detection
    description: "Block outbound communications containing PII regardless of operation framing"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply", "compose"]
      params:
        body:
          contains_pii: true
          pii_types: ["ssn", "bank_account", "credit_card", "medical_record"]
    decision: DENY
    reason: "Outbound email contains PII. Requires explicit owner authorization."
    metadata:
      pii_detected: true
      compliance_flags: ["GDPR.Article30", "HIPAA.164.312b", "SOC2.CC6.1"]

  - id: outbound-pii-external-recipient
    description: "DEFER outbound emails with any sensitive content to external domains"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply"]
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
        body:
          sensitivity_score_gte: 0.6
    decision: DEFER
    reason: "Email to external recipient with sensitive content requires authorization"

  - id: email-thread-forwarding
    description: "Require approval before forwarding any email thread externally"
    match:
      tool: "email"
      operation: "forward"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DEFER
    reason: "Forwarding email threads externally requires explicit authorization"
# faramesh/policies/pii-governance.yaml
version: "1.0"
agent_id: "*"

rules:
  - id: outbound-pii-detection
    description: "Block outbound communications containing PII regardless of operation framing"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply", "compose"]
      params:
        body:
          contains_pii: true
          pii_types: ["ssn", "bank_account", "credit_card", "medical_record"]
    decision: DENY
    reason: "Outbound email contains PII. Requires explicit owner authorization."
    metadata:
      pii_detected: true
      compliance_flags: ["GDPR.Article30", "HIPAA.164.312b", "SOC2.CC6.1"]

  - id: outbound-pii-external-recipient
    description: "DEFER outbound emails with any sensitive content to external domains"
    match:
      tool: "email"
      operation:
        any_of: ["send", "forward", "reply"]
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
        body:
          sensitivity_score_gte: 0.6
    decision: DEFER
    reason: "Email to external recipient with sensitive content requires authorization"

  - id: email-thread-forwarding
    description: "Require approval before forwarding any email thread externally"
    match:
      tool: "email"
      operation: "forward"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DEFER
    reason: "Forwarding email threads externally requires explicit authorization"

The key insight in this policy is that forward is enumerated alongside send, reply, and compose in the operation matcher. The canonical action representation makes the operation explicit regardless of how the agent phrased it internally. There is no semantic gap to exploit because the evaluation operates on the action's actual parameters, not its natural language description.

The DPR record for the denied action includes the detected PII types, the recipient domain, and the policy version that produced the denial. This is the compliance artifact, not a log entry that says "email blocked" but a cryptographically bound record proving which policy rule evaluated which action, at which policy version, with which result.

Case Study 4: Unauthorized Compliance With Non-Owners

What happened: Agents followed commands from users who were not their designated owners. In a multi-user Discord environment, agents could not reliably distinguish between authorized and unauthorized instruction sources. A non-owner could issue commands and the agent would comply.

The structural problem: No stakeholder model. The agent's authority evaluation was based on channel membership and conversational tone, not cryptographic identity verification. Anyone who could send a message to the agent's Discord channel could potentially influence its behavior.

The Faramesh primitive: Identity binding in the DPR chain and session-level authority validation. Every action submitted to Faramesh includes identity context. The policy engine evaluates whether the identity that issued the instruction is authorized to issue instructions to this agent in this context.

# faramesh/policies/identity-governance.yaml
version: "1.0"
agent_id: "*"

identity_config:
  owner_verification: required
  session_binding: strict
  identity_source: "verified_session_token"

rules:
  - id: high-risk-non-owner-compliance
    description: "Block high-risk actions instructed by non-owner identities"
    match:
      risk_level:
        any_of: ["high", "critical"]
      identity:
        role_not_in: ["owner", "admin"]
    decision: DENY
    reason: "High-risk actions require owner authorization"

  - id: file-system-non-owner
    description: "Require owner approval for filesystem operations from non-owner instructions"
    match:
      tool: "filesystem"
      operation:
        any_of: ["write", "delete", "move", "chmod"]
      identity:
        role: "non_owner"
    decision: DEFER
    reason: "Filesystem modification from non-owner instruction requires owner approval"

  - id: external-communication-non-owner
    description: "Block external communications instructed by non-owners"
    match:
      tool:
        any_of: ["email", "discord", "webhook"]
      operation:
        any_of: ["send", "forward", "post"]
      identity:
        role: "non_owner"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DENY
    reason: "External communications require owner authorization"
# faramesh/policies/identity-governance.yaml
version: "1.0"
agent_id: "*"

identity_config:
  owner_verification: required
  session_binding: strict
  identity_source: "verified_session_token"

rules:
  - id: high-risk-non-owner-compliance
    description: "Block high-risk actions instructed by non-owner identities"
    match:
      risk_level:
        any_of: ["high", "critical"]
      identity:
        role_not_in: ["owner", "admin"]
    decision: DENY
    reason: "High-risk actions require owner authorization"

  - id: file-system-non-owner
    description: "Require owner approval for filesystem operations from non-owner instructions"
    match:
      tool: "filesystem"
      operation:
        any_of: ["write", "delete", "move", "chmod"]
      identity:
        role: "non_owner"
    decision: DEFER
    reason: "Filesystem modification from non-owner instruction requires owner approval"

  - id: external-communication-non-owner
    description: "Block external communications instructed by non-owners"
    match:
      tool:
        any_of: ["email", "discord", "webhook"]
      operation:
        any_of: ["send", "forward", "post"]
      identity:
        role: "non_owner"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DENY
    reason: "External communications require owner authorization"
# faramesh/policies/identity-governance.yaml
version: "1.0"
agent_id: "*"

identity_config:
  owner_verification: required
  session_binding: strict
  identity_source: "verified_session_token"

rules:
  - id: high-risk-non-owner-compliance
    description: "Block high-risk actions instructed by non-owner identities"
    match:
      risk_level:
        any_of: ["high", "critical"]
      identity:
        role_not_in: ["owner", "admin"]
    decision: DENY
    reason: "High-risk actions require owner authorization"

  - id: file-system-non-owner
    description: "Require owner approval for filesystem operations from non-owner instructions"
    match:
      tool: "filesystem"
      operation:
        any_of: ["write", "delete", "move", "chmod"]
      identity:
        role: "non_owner"
    decision: DEFER
    reason: "Filesystem modification from non-owner instruction requires owner approval"

  - id: external-communication-non-owner
    description: "Block external communications instructed by non-owners"
    match:
      tool:
        any_of: ["email", "discord", "webhook"]
      operation:
        any_of: ["send", "forward", "post"]
      identity:
        role: "non_owner"
      params:
        recipient:
          domain_not_in: ["trusted-domains.yaml"]
    decision: DENY
    reason: "External communications require owner authorization"

The identity verification is not based on display names or message source claims. It is bound to the session token that was established when the owner initialized the agent. A Discord display-name change does not re-authenticate the session. A message from a different user in the same channel carries a different identity context that the policy engine evaluates independently.

Case Study 5: False Completion Reports

What happened: In several cases, agents reported tasks as successfully completed while the underlying system state contradicted those reports. The agent said "done." The system said otherwise. The agent believed its own reports.

The structural problem: There is no external verification layer that cross-references agent status reports against actual system state. Agents evaluate their own completion based on internal reasoning, not observed outcomes.

What Faramesh addresses: The DPR chain provides the factual record of what actions were evaluated and what decisions were made. The chain does not depend on the agent's self-report. An owner can query the DPR chain for any session and see every action that was submitted, every decision that was produced, and whether the intended execution was authorized and confirmed.

# Query the DPR chain for a specific session
faramesh replay --session-id ses_8f4a2b --policy ./policies/task-governance.yaml

# Output: every action evaluated in this session, the decision, 
# the policy version, and the canonical action hash.
# The chain is independent of what the agent reported.
# Query the DPR chain for a specific session
faramesh replay --session-id ses_8f4a2b --policy ./policies/task-governance.yaml

# Output: every action evaluated in this session, the decision, 
# the policy version, and the canonical action hash.
# The chain is independent of what the agent reported.
# Query the DPR chain for a specific session
faramesh replay --session-id ses_8f4a2b --policy ./policies/task-governance.yaml

# Output: every action evaluated in this session, the decision, 
# the policy version, and the canonical action hash.
# The chain is independent of what the agent reported.

This does not prevent the agent from believing it completed a task when it did not. That is a reasoning layer problem that Faramesh cannot solve from the execution layer. But it provides the owner with an independent audit trail that is not contaminated by the agent's self-report. The forensic record is built from actual action submissions, not agent narration.

Where Faramesh adds concrete value is in requiring action confirmation for high-stakes task completions. An agent that claims to have "completed the deployment" must have submitted a completion action that was evaluated and permitted. If no such action exists in the DPR chain, the task was not completed through the governed layer regardless of what the agent said.

Case Study 6: Cross-Agent Propagation of Unsafe Practices

What happened: Unsafe practices spread from one agent to another through shared context. When one agent adopted a risky behavior pattern, other agents in the same environment picked it up. One adversary planted an externally editable GitHub Gist "constitution" in an agent's memory, which the agent then voluntarily shared with other agents, propagating the attacker's control surface across the entire fleet.

The structural problem: Multi-agent authorization is not transitive. Agent A's permissions should not be inherited by Agent B simply because A passed instructions to B. A corrupted constitution in A's memory should not elevate B's authorization level.

The Faramesh primitive: Sub-agent delegation tracking in the DPR chain. Every action in a multi-agent chain is evaluated independently against the current policy. The authorization scope of an action does not inherit from the agent that requested it, it is evaluated from scratch at the policy engine for the receiving agent.

# faramesh/policies/multi-agent-governance.yaml
version: "1.0"
agent_id: "*"

multi_agent_config:
  delegation_tracking: enabled
  scope_inheritance: disabled  # critical: child agents do not inherit parent scope
  constitution_sources:
    allowed:
      - "owner_config_files"
      - "verified_policy_packs"
    blocked:
      - "external_urls"
      - "agent_memory"
      - "inter_agent_messages"

rules:
  - id: external-constitution-injection
    description: "Block agents from loading behavior instructions from external or agent-provided sources"
    match:
      tool:
        any_of: ["memory", "filesystem", "http"]
      operation: "read"
      params:
        source_type:
          any_of: ["external_url", "agent_message", "shared_memory"]
        content_type: "behavior_instruction"
    decision: DENY
    reason: "Behavior instructions must originate from owner configuration, not external or inter-agent sources"

  - id: inter-agent-privileged-instruction
    description: "Require owner approval before executing high-privilege instructions from another agent"
    match:
      instruction_source: "agent"
      risk_level:
        any_of: ["high", "critical"]
    decision: DEFER
    reason: "High-risk instructions from other agents require owner authorization"

  - id: memory-write-external-content
    description: "Block writing externally-sourced content to persistent memory"
    match:
      tool: "memory"
      operation: "write"
      params:
        content_source: "external"
    decision: DENY
    reason: "External content cannot be written to persistent agent memory without owner authorization"
# faramesh/policies/multi-agent-governance.yaml
version: "1.0"
agent_id: "*"

multi_agent_config:
  delegation_tracking: enabled
  scope_inheritance: disabled  # critical: child agents do not inherit parent scope
  constitution_sources:
    allowed:
      - "owner_config_files"
      - "verified_policy_packs"
    blocked:
      - "external_urls"
      - "agent_memory"
      - "inter_agent_messages"

rules:
  - id: external-constitution-injection
    description: "Block agents from loading behavior instructions from external or agent-provided sources"
    match:
      tool:
        any_of: ["memory", "filesystem", "http"]
      operation: "read"
      params:
        source_type:
          any_of: ["external_url", "agent_message", "shared_memory"]
        content_type: "behavior_instruction"
    decision: DENY
    reason: "Behavior instructions must originate from owner configuration, not external or inter-agent sources"

  - id: inter-agent-privileged-instruction
    description: "Require owner approval before executing high-privilege instructions from another agent"
    match:
      instruction_source: "agent"
      risk_level:
        any_of: ["high", "critical"]
    decision: DEFER
    reason: "High-risk instructions from other agents require owner authorization"

  - id: memory-write-external-content
    description: "Block writing externally-sourced content to persistent memory"
    match:
      tool: "memory"
      operation: "write"
      params:
        content_source: "external"
    decision: DENY
    reason: "External content cannot be written to persistent agent memory without owner authorization"
# faramesh/policies/multi-agent-governance.yaml
version: "1.0"
agent_id: "*"

multi_agent_config:
  delegation_tracking: enabled
  scope_inheritance: disabled  # critical: child agents do not inherit parent scope
  constitution_sources:
    allowed:
      - "owner_config_files"
      - "verified_policy_packs"
    blocked:
      - "external_urls"
      - "agent_memory"
      - "inter_agent_messages"

rules:
  - id: external-constitution-injection
    description: "Block agents from loading behavior instructions from external or agent-provided sources"
    match:
      tool:
        any_of: ["memory", "filesystem", "http"]
      operation: "read"
      params:
        source_type:
          any_of: ["external_url", "agent_message", "shared_memory"]
        content_type: "behavior_instruction"
    decision: DENY
    reason: "Behavior instructions must originate from owner configuration, not external or inter-agent sources"

  - id: inter-agent-privileged-instruction
    description: "Require owner approval before executing high-privilege instructions from another agent"
    match:
      instruction_source: "agent"
      risk_level:
        any_of: ["high", "critical"]
    decision: DEFER
    reason: "High-risk instructions from other agents require owner authorization"

  - id: memory-write-external-content
    description: "Block writing externally-sourced content to persistent memory"
    match:
      tool: "memory"
      operation: "write"
      params:
        content_source: "external"
    decision: DENY
    reason: "External content cannot be written to persistent agent memory without owner authorization"

The scope_inheritance: disabled configuration is the critical line. Without it, a compromised Agent A passing instructions to Agent B effectively gives B A's authorization scope. With it, every action B takes is evaluated as if it originated from B directly, under B's assigned policy. The propagated constitution cannot elevate B's privileges because B's policy engine evaluates B's actions, it does not ask what A was allowed to do.

The One Failure Faramesh Does Not Solve

Memory poisoning with numerical corruption. The study demonstrated that telling Claude to remember financial figures "0.4% wrong" produced no warning, it simply stored corrupted data. Input 49,228 euros; stored 49,424.91 euros. Every subsequent calculation built on the corrupted foundation.

This is a model-layer problem, not an execution-layer problem. The corruption occurs in the reasoning and memory storage process before any action is submitted to the governance layer. Faramesh evaluates actions at submission time. It cannot inspect the internal reasoning state that produced the parameters of an action.

What Faramesh catches is the action that corrupted memory eventually produces. If a refund action with an amount derived from corrupted memory violates a policy rule, amount_gt: 1000 on a customer service agent, for example, the action is denied. The corrupted memory does not directly cause harm because the action it produces is still evaluated against policy.

But Faramesh does not detect the memory corruption itself. An agent reasoning about 49,424.91 euros from a corrupted memory and producing an action for 48.50, well within policy, is not caught at the execution layer. The corruption is invisible until it produces an action that violates policy thresholds.

This is the gap between execution governance and reasoning governance. The AARM specification calls it an open research problem. We call it an open research problem too. The honest answer is: pre-execution policy enforcement catches the execution consequences of memory corruption. It does not catch the corruption.

What the Study Proves Architecturally

The authors' concluding structural diagnosis is worth quoting precisely: agents need a stakeholder model for distinguishing owners from non-owners, a self-model for recognizing competence boundaries, and a private deliberation surface for reasoning about channel visibility.

These three requirements map directly to the layers a governance infrastructure must provide:

Stakeholder model → Identity binding and authorization scope. Faramesh binds every action to a verified identity and evaluates whether that identity is authorized to perform the action in the current context. The agent does not need to solve the stakeholder problem internally. The governance layer enforces it externally.

Self-model → Resource budgets and circuit breakers. An agent that recognizes its own competence boundaries would stop a runaway loop. The governance layer enforces the stop condition regardless of whether the agent recognizes it. Session limits, action velocity thresholds, and budget policies are external circuit breakers that function even when the agent's self-model fails.

Private deliberation surface → Pre-execution evaluation before external effects. The fundamental problem with agents performing actions in multi-party environments is that by the time the action is visible to observers, its effects are already underway. Pre-execution evaluation means every action is evaluated before it produces external effects. The deliberation happens before the action, not after.

The study used OpenClaw as its infrastructure. OpenClaw is what openclaw plugin install @faramesh/openclaw governs. The path from the failures documented in this paper to a governed deployment is a single command.

The Execution Boundary Is Not Optional

The failures in the Agents of Chaos study are not edge cases. They are the predictable consequences of deploying non-deterministic reasoning systems that have unrestricted access to consequential actions. The models used in this study, Claude Opus 4.6 and Kimi K2.5, are among the most capable and most carefully aligned models currently available. They still destroyed mail servers, leaked SSNs, and entered nine-day infinite loops.

Better alignment does not solve this. Ash's values were correct. Its judgment about proportionality was catastrophic. The problem is not that the model was poorly aligned. The problem is that no external system evaluated the proposed action against a policy that encoded "destructive actions require human authorization" before the action executed.

That external evaluation system is what the study proves must exist. Not as a nice-to-have. As a prerequisite for deploying agents with tool access in any environment where the consequences of errors are real.

Every organization deploying agents with shell access, email access, file system access, or API access to consequential systems should be asking one question: what is our pre-execution governance layer?

This paper documents what happens when the answer is nothing.

Faramesh is the execution control plane for AI agents. Every tool call evaluated before it runs.

Install the OpenClaw integration:

# 1. Install Faramesh
pip install faramesh

# 2. Start the server
faramesh serve

# 3. Enable the plugin in OpenClaw
# Add to your OpenClaw configuration or install via:
openclaw plugins install @faramesh/openclaw

# 4. Open the dashboard

# 1. Install Faramesh
pip install faramesh

# 2. Start the server
faramesh serve

# 3. Enable the plugin in OpenClaw
# Add to your OpenClaw configuration or install via:
openclaw plugins install @faramesh/openclaw

# 4. Open the dashboard

# 1. Install Faramesh
pip install faramesh

# 2. Start the server
faramesh serve

# 3. Enable the plugin in OpenClaw
# Add to your OpenClaw configuration or install via:
openclaw plugins install @faramesh/openclaw

# 4. Open the dashboard

Read the specification: faramesh.dev/docs/faramesh-core-spec-v1.0

arXiv reference: Shapira, N. et al. "Agents of Chaos." arXiv:2602.20021 (2026).

Previous

More

Next

More

[GET STARTED IN MINUTES]

Ready to give Faramesh a try?

The execution boundary your agents are missing.
Start free. No credit card required.