Original research — formal tools, behavioral taxonomies, and governance frameworks that shape the industry and feed directly into operational capability.
When multiple AI governance policies conflict simultaneously, how should a system decide which wins? Claw answers this with mathematical precision — using Dung's Abstract Argumentation Frameworks to compute defensible, auditable policy resolutions with full reasoning chains.
Enterprise AI deployments face a governance challenge existing tools cannot solve: policy signals contradict each other in real time. A content safety policy says block. A compliance policy says pass. A context policy flags for review. The system must decide — and the decision must be auditable, defensible, and explainable to a regulator or incident responder.
Claw's core contribution applies Dung's Abstract Argumentation Frameworks (AAF) to the AI governance layer. Each policy signal becomes an argument. Attack relations encode logical relationships between conflicting policies. The engine computes extensions — grounded, preferred, and stable — representing sets of arguments that survive mutual scrutiny. The grounded extension provides the skeptical, most conservative resolution; preferred extensions identify all defensible positions.
This is not a rule engine. It is a formal reasoning system that explains why a particular resolution was reached, traces which arguments defeated which, and produces a complete reasoning chain suitable for audit logs and incident response documentation.
Claw operates as an MCP (Model Context Protocol) server, placing a governance gateway between AI agents and their execution environment. Incoming requests pass through four stages: PII scanning, OPA policy evaluation (Rego-based rules against the policy corpus), argumentation resolution (AAF extension computation for conflicting signals), and context masking before the model layer.
Claw ships with a Manifest V3 browser extension tested on Chrome, Edge, Brave, and Firefox — rendering OPA decision states, PII scan results, and argumentation outcomes inline during browser-based AI interactions.
Most AI security tools inspect individual requests in isolation. Agora examines the session — the sequence of interactions over time — and identifies behavioral patterns that signal systematic abuse, social engineering, or policy circumvention attempts.
Individual prompt-level filtering is necessary but not sufficient. A sophisticated adversary distributes their attack across many individually benign interactions. Only when examined as a sequence does the pattern reveal itself: escalating authority claims after refusal, language switching to evade filters, progressive hypothetical framing to normalize dangerous requests. Agora names these patterns and operationalizes them as detection rules.
Agora's signals are operationalized as Wazuh detection rules, enabling real-time behavioral alerting within standard SIEM infrastructure. Session Behavioral Events (SBE v1) are emitted in a defined JSON schema, ingested by Wazuh decoders, and matched against pattern rules. When multiple Agora signals fire simultaneously, alert priorities are composed by Claw's argumentation engine — the two tools are architecturally complementary.
Constitutional AI provides principles for what AI systems should and shouldn't do. Formal argumentation provides machinery to adjudicate conflicts between those principles at runtime. Activation steering provides a mechanism to enforce behavioral constraints at the model weight level. Can these three approaches be unified into a coherent, auditable behavioral governance stack?
The alignment research community has produced two significant technical contributions that, taken together, create an opportunity for a genuinely new class of security tool. Anthropic's Constitutional AI framework establishes a methodology for encoding behavioral constraints through self-critique and revision. Contrastive Activation Addition (CAA) and related techniques provide a means of directly modifying model behavior by adding vectors in the residual stream that correspond to desired or undesired behavioral directions.
Both approaches solve different problems, and neither alone addresses the governance challenge Claw was built to solve: what happens when behavioral constraints conflict? CAA can steer a model away from harmful behavior — but which constraint takes priority when multiple apply simultaneously? Without a principled resolution mechanism, the outcome is unpredictable. Claw's argumentation engine addresses this gap directly: constraints become arguments, conflict relations are encoded as attack relations, and the computed extension specifies which constraints are jointly defensible — not as a priority number but as a formal proof.
Elthea AI's work on activation-steering infrastructure creates a compelling integration possibility. If Claw's argumentation engine determines which behavioral constraints are jointly defensible, and Elthea's platform enforces those constraints at the activation level, the combined system offers something novel: governance that is both formally reasoned at the policy layer and mechanistically enforced at the model layer. We are exploring this potential collaboration and welcome contact from researchers working at this intersection.
Can your SIEM detect a Refusal-Rephrase Cycle? Agora was built for exactly what legacy tools miss.
Get a Demo