35 - CrewAI ROA Wrapper
Goal: Demonstrate that Task-Oriented Agents (CrewAI) and Mission-Oriented Agents (ROA) can coexist. Wrap a real CrewAI Crew in an ROA interface, producing structured JSON output (output_json) and converting it to DIR PolicyProposal (Claim) instead of direct execution (Fact). Prove the pattern with a Customer Claims Agent use case where the DIR Kernel rejects or escalates refund proposals based on return window, amount limits, and category boundaries.
Topology: classic (wrapper). Mechanisms: AgentRegistry.handshake, ContextStore sessions, validate_proposal from dir_core plus claims-specific rules in dim.py, idempotency_key on ACCEPT execution, StorageBundle.decision_audit via telemetry.py, scenario batch from scenarios.yaml (schemas.load_scenarios).
DIR alignment: ROA Manifesto §4–5 (Explain → Policy → Self-Check → Proposal; User Space vs. Kernel Space), §10 (Boxed Intelligence), DIR Architectural Pattern §6 (Decision Integrity Module).
Use cases
flowchart TB
subgraph Actors[" "]
C[Customer / intake channel]
O[Operator / batch runner]
end
subgraph System["DIR sample 35"]
R[run.py scenario loop]
A[agent: CrewAI or mock ROA]
D[DIM + StorageBundle telemetry]
end
C -->|claim text or structured claim| R
O -->|runs python samples/35.../run.py| R
R --> A
A -->|PolicyProposal| D
The Core Concept: Taming the Task-Oriented Crew
CrewAI Crews are collaborative, task-driven agents (e.g., Analyst + Decision Maker). They receive inputs, reason, call tools, and execute. They have no mission, no boundaries, no persistent responsibility. They are optimized for "What can the crew do next?"-not "What is this crew responsible for?" (ROA Manifesto §3).
This creates a fundamental mismatch. In production, Crews:
- Execute side effects directly (API calls, database writes)
- Lack authority boundaries (they may act outside their intended scope)
- Provide no deterministic safety guarantees
The solution: Wrap the Crew in an ROA shell. The Crew retains its reasoning power but is forced to output via structured JSON (output_json / RefundProposalOutput). That output does not execute. It passes intent over "The Wall" to the DIR Kernel Space.
The result: a mission-oriented Crew whose outputs are Claims, not Facts. A Claim becomes a Fact only after the Decision Integrity Module validates it and the Execution Engine runs it (DIR §6-7).
Architecture
Diagram 1 - System Overview: CrewAI wrapped by ROA, processed by DIR
---
config:
layout: elk
---
flowchart TB
subgraph CFG["config.yaml"]
LLMCFG["`llm_defaults<br/>gemma3:4b @ localhost:11434`"]
CONTRACT["`agent.contract - ClaimsContract<br/>allowed_categories - max_refund - return_window`"]
CTXSTORE["`context_store - Orders<br/>purchase_date - category - amount`"]
end
subgraph US["USER SPACE - Probabilistic - Ollama / Gemma3"]
subgraph ROA["ROA Wrapper - CrewAIROAWrapper"]
subgraph CREW["CrewAI Crew - sequential"]
ANA["`Claims Analyst<br/>LLM text reasoning`"]
DM["`Decision Maker<br/>output_json = RefundProposalOutput`"]
ANA -->|eligibility summary| DM
end
end
DM -->|JSON Claim| WALL
end
WALL{{"`THE WALL<br/>Claim to PolicyProposal`"}}
subgraph KS["KERNEL SPACE - Deterministic - DIR"]
DIM["`dir_core.validate_proposal + dim.py<br/>L1: Schema + RBAC + contract<br/>L2: Order existence<br/>L3: Category boundary<br/>L4: Return window<br/>L5: Amount limit`"]
ACCEPT["ACCEPT"]
ESC["`ESCALATE<br/>human review`"]
REJ["REJECT"]
DIM --> ACCEPT & ESC & REJ
end
WALL --> DIM
CTXSTORE -.->|order data| DIM
CONTRACT -.->|boundaries| DIM
LLMCFG -.->|model / endpoint| CREW
style US fill:#fffde7,stroke:#f9a825,color:#333
style KS fill:#e8f5e9,stroke:#388e3c,color:#333
style ROA fill:#fff9c4,stroke:#f57f17,color:#333
style CREW fill:#fff3e0,stroke:#e65100,color:#333
style WALL fill:#37474f,color:#fff
style ACCEPT fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20
style ESC fill:#fff9c4,stroke:#f57f17,color:#e65100
style REJ fill:#ffcdd2,stroke:#c62828,color:#b71c1c
Execution flow
Diagram 2 — end-to-end sequence for a single claim
sequenceDiagram
actor Caller as run.py
participant CFG as config.yaml
participant Agent as agent.run_claims_roa_cycle
participant Analyst as Claims Analyst (CrewAI / Gemma3)
participant DM as Decision Maker (CrewAI / Gemma3)
participant DIM as DIM Validator (Kernel Space)
participant CS as Context Store (config.yaml)
Caller ->> CFG: load_yaml_config + load_scenarios()
CFG -->> Caller: agents, context_store, scenarios.yaml rows
loop for each scenario (6 rows in scenarios.yaml)
Note over Caller, Agent: E,F: NL intake via extract_claim (Crew or mock regex)
Caller ->> Agent: run_claims_roa_cycle(dfid, claim, ...)
rect rgb(255, 253, 231)
Note over Agent, DM: USER SPACE - probabilistic
Agent ->> Analyst: Task: analyze claim eligibility
Analyst ->> Analyst: Gemma3 reasoning
Analyst -->> DM: eligibility summary (text)
DM ->> DM: Gemma3 reasoning (output_json)
DM -->> Agent: RefundProposalOutput JSON
end
Note over Agent, DIM: THE WALL - Claim to PolicyProposal
rect rgb(232, 245, 233)
Note over DIM, CS: KERNEL SPACE - deterministic
Caller ->> DIM: validate_claims_proposal(proposal, dim_ctx, contract, dim_contract)
DIM ->> CS: lookup order (purchase_date, category)
CS -->> DIM: order record
alt L1-L4 pass AND amount <= 500 EUR
DIM -->> Caller: ACCEPT
else L1-L4 pass AND amount > 500 EUR
DIM -->> Caller: ESCALATE
else L2 fail - order not found
DIM -->> Caller: REJECT (order unknown)
else L3 fail - prohibited category
DIM -->> Caller: REJECT (category boundary)
else L4 fail - outside return window
DIM -->> Caller: REJECT (return window)
end
end
end
Diagram 3 - Test Scenarios: 6 claims through the DIM validation pipeline
---
config:
layout: elk
---
flowchart TD
subgraph SCENARIOS["scenarios.yaml"]
SA["`A - claim dict<br/>ord_001 - 299.99 EUR`"]
SB["`B - claim dict<br/>ord_002 - 1200 EUR`"]
SC["`C - claim dict<br/>ord_005 - 500 EUR`"]
SD["`D - claim dict<br/>ord_004 - 50 EUR`"]
SE["`E - claim_text NL<br/>ord_001 - 299.99 EUR`"]
SF["`F - claim_text NL<br/>ord_002 - 1200 EUR`"]
end
subgraph EXTRACT["NL intake (E, F only)"]
EX["`extract_claim_from_text()<br/>LLM → structured claim`"]
end
subgraph CREW_US["CrewAI Crew - User Space - Gemma3"]
PROP["`Analyst to Decision Maker<br/>output: REFUND proposal JSON`"]
end
SA & SB & SC & SD --> PROP
SE & SF --> EX
EX --> PROP
WALL{{"THE WALL"}}
PROP --> WALL
subgraph DIM_KS["DIM - Kernel Space - validate_claims_proposal + dir_core"]
L1["L1 Schema + RBAC - pass"]
L2["L2 Order exists - pass"]
L3{"`L3 Category<br/>in allowed list?`"}
L4{"`L4 Purchase date<br/>within 14 days?`"}
L5{"`L5 Amount<br/>max 500 EUR?`"}
end
WALL --> L1 --> L2 --> L3
L3 -->|electronics / clothing / home| L4
L3 -->|prohibited_category - D| RD
L4 -->|within window - A, B, E, F| L5
L4 -->|2026-01-01 expired - C| RC
L5 -->|299.99 - A, E| RA
L5 -->|1200.00 - B, F| RB
RA["`**ACCEPT**<br/>A, E`"]
RB["`**ESCALATE**<br/>B, F - human review`"]
RC["`**REJECT**<br/>C - return window expired`"]
RD["`**REJECT**<br/>D - category not allowed`"]
style SCENARIOS fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
style EXTRACT fill:#e1f5fe,stroke:#0288d1,color:#01579b
style CREW_US fill:#fffde7,stroke:#f9a825,color:#333
style DIM_KS fill:#e8f5e9,stroke:#388e3c,color:#333
style WALL fill:#37474f,color:#fff
style RA fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20
style RB fill:#fff9c4,stroke:#f57f17,color:#e65100
style RC fill:#ffcdd2,stroke:#c62828,color:#b71c1c
style RD fill:#ffcdd2,stroke:#c62828,color:#b71c1c
Key Difference from a Naive Crew
| Naked CrewAI Crew | ROA-Wrapped Crew | |
|---|---|---|
| Actions | Any tool, any API, any DB | Only structured JSON (output_json) |
| Enforcement | None (trust the LLM) | Deterministic DIM in Kernel Space |
| Output | Side effects (Facts) | Proposals (Claims) |
| Authority | Unbounded | ClaimsContract boundaries |
The Claims Agent Scenario
Analyst vs DIM - Who Validates What?
The Claims Analyst (LLM) analyzes only the claim data provided in the scenario - it has no access to Context Store. It can reason about boundaries (categories, limits) from the prompt text, but it may err (e.g., wrong return-window assessment). The DIM (Kernel Space) is the source of truth: it reads purchase_date and category from Context Store and enforces rules deterministically. If the Analyst says "OK" but the data contradicts - DIM rejects. This separation is intentional: User Space proposes, Kernel Space decides.
Mission
Process customer claims fairly, within policy boundaries, without exceeding authority or escalating unnecessarily.
Responsibility Contract (ClaimsContract)
| Field | Description |
|---|---|
allowed_refund_categories |
Product categories agent may propose refunds for (e.g., electronics, clothing) |
max_refund_without_escalation |
EUR threshold-above requires human approval (ESCALATE) |
return_window_days |
Max days from purchase for automatic eligibility |
Natural Language Intake (Scenarios E, F)
In production, customers write free-form text in English (e.g. "I bought ord_001 for 299 EUR, defective product"), not JSON. The LLM extracts structured claim data (order_id, amount_eur, category, reason) in a single call before the Crew processes it. This justifies the LLM: deterministic rules alone cannot parse natural language.
Scenarios
| Scenario | Input | DIM Verdict | Reason |
|---|---|---|---|
| A | Valid claim: within return window, amount ≤ 500 EUR | ACCEPT | All criteria met |
| B | Valid claim: amount > 500 EUR | ESCALATE | Human approval required |
| C | Claim outside return window (purchased 2026-01-01) | REJECT | Outside return_window_days |
| D | Claim for prohibited category | REJECT | Category not in allowed_refund_categories |
| E | NL text: valid claim (ord_001, 299.99 EUR) | ACCEPT | LLM extracts → DIM validates |
| F | NL text: amount > 500 EUR (ord_002, 1200 EUR) | ESCALATE | LLM extracts → DIM escalates |
Layout (Sample Guide §3)
| File | Role |
|---|---|
run.py |
Bootstrap, handshake, ContextStore, scenario loop, DIM, telemetry, idempotent execution |
config.yaml |
database, llm_defaults, simulation, agents[], context_store |
scenarios.yaml |
Scenario rows with context.claim or context.claim_text and expected |
schemas.py |
load_scenarios, parse_llm_json, CrewConfig, contract payload helper |
contracts.py |
ClaimsContract built from YAML + ResponsibilityContract |
agent.py |
CrewAI Explain→Policy path, mock deterministic path, Self-Check, PolicyProposal |
dim.py |
validate_claims_proposal (wraps dir_core.validate_proposal + claims rules) |
telemetry.py |
SIMULATION_*, AGENT_DECISION, CLAIM_REFUND_EXECUTED, self-check failures |
mocks/llm_mock_strategy.py |
make_mock_strategy for setup_environment when mock is selected |
Configuration
Runtime settings are split between config.yaml (persistence, LLM defaults, agents, authoritative context_store) and scenarios.yaml (batch inputs and expected DIM verdicts). context_store.orders[*].purchase_date must fall within claims_bounds.return_window_days of the wall-clock date you run against, or ACCEPT scenarios will see REJECT from the return-window rule.
database:
provider: sqlite
db_path: "data/crewai_roa.db"
llm_defaults:
model: "gemma3:4b"
base_url: "http://localhost:11434"
temperature: 0.2
simulation:
run_id: "crewai_claims_batch_001"
agents:
- agent_id: "claims_agent_v1"
contract:
role: EXECUTOR
authorized_instruments: [electronics, clothing, home]
allowed_policy_types: [REFUND, REPLACE, ESCALATE]
escalate_on_uncertainty: 0.7
max_drawdown_limit: 0.05
wake_up_threshold_pct: 0.5
parent_agent_id: null
claims_bounds:
max_refund_without_escalation: 500.0
return_window_days: 14
context_store:
orders:
ord_001: { purchase_date: "...", category: electronics, amount: 299.99 }
| Section | Purpose |
|---|---|
database |
SQLite path anchored next to config.yaml via setup_environment |
llm_defaults |
Default Ollama endpoint; overridden by OLLAMA_* env vars when set |
simulation.run_id |
simulation_id in every telemetry details payload |
agents[].contract |
Canonical ResponsibilityContract fields for YamlContractProvider |
agents[].claims_bounds |
Claims-only limits read into ClaimsContract |
context_store |
Authoritative orders for DIM (not injected into the Crew prompt) |
How to run
Mock (no network, no Ollama, no CrewAI LLM calls)
Deterministic claim→proposal path and regex NL extraction for scenarios E–F.
pip install -e .
$env:PYTHONPATH="src;samples"; $env:USE_MOCK_LLM="1" # PowerShell
python samples/35_crewai_roa_wrapper/run.py
Ollama + CrewAI (full User Space)
pip install -e ".[crewai]"
ollama serve
ollama pull gemma3:4b
$env:PYTHONPATH="src;samples"
python samples/35_crewai_roa_wrapper/run.py
Unset USE_MOCK_LLM (or set it to 0) so configured_live_llm_is_reachable can succeed; otherwise the sample stays on the mock path.
Gemini
This sample wires CrewAI to the OpenAI-compatible Ollama endpoint. Gemini is not configured here; use mock or Ollama as above.
Env var overrides (Ollama):
$env:OLLAMA_BASE_URL = "http://localhost:11434"
$env:OLLAMA_MODEL = "gemma3:4b"
Database storage
Events are written only through bundle.decision_audit (see telemetry.py). Typical event values for this sample: SIMULATION_START, SIMULATION_END, AGENT_DECISION, CLAIM_REFUND_EXECUTED, CLAIMS_SELF_CHECK_FAILED.
Group a run by simulation_id stored inside JSON details (Sample Guide §9.4):
-- SQLite
SELECT dfid, event, detail_json
FROM decision_audit_events
WHERE json_extract(detail_json, '$.simulation_id') = 'crewai_claims_batch_001'
ORDER BY id;
-- PostgreSQL
SELECT dfid, event, detail_json
FROM decision_audit_events
WHERE detail_json->>'simulation_id' = 'crewai_claims_batch_001'
ORDER BY id;
Why provider="openai"?
CrewAI's LLM class routes provider="openai" to the native OpenAI Python SDK.
Ollama exposes an OpenAI-compatible API at /v1, so no LiteLLM or extra dependencies needed:
LLM(model="gemma3:4b", provider="openai",
base_url="http://localhost:11434/v1", api_key="ollama")
How the Interception Works (output_json instead of tool-calling)
Gemma3 (and most local Ollama models) do not support OpenAI-style function calling.
The original Submit_Policy_Proposal tool approach requires the model to call functions - Ollama returns HTTP 400 for such requests.
Solution: output_json on the Decision Maker's Task.
CrewAI's output_json parameter instructs the LLM to format its entire response as JSON matching a Pydantic schema, then validates and parses it automatically:
class RefundProposalOutput(BaseModel):
action: str # always "REFUND"
order_id: str
amount_eur: float
category: str
reason: str
decide_task = Task(
description="Based on the analyst's findings, produce a refund proposal...",
output_json=RefundProposalOutput, # ← no tool-calling needed
agent=decision_maker, # ← no tools=[] on the agent
)
result = crew.kickoff()
data = result.json_dict # parsed + validated dict
proposal = PolicyProposal(..., params=data)
Architecturally equivalent: the LLM still produces a Claim (JSON proposal), which crosses "The Wall" to the DIM for deterministic validation before any Fact (execution) occurs. The boundary holds.
Key Components
| Component | Purpose |
|---|---|
ClaimsContract |
Defines authority boundaries (categories, amount limit, return window) |
extract_claim_from_text() |
NL intake: extracts structured claim from customer text (single LLM call) |
ClaimExtractionOutput |
Pydantic schema for NL extraction output |
RefundProposalOutput |
Pydantic schema for output_json - structured proposal from Decision Maker |
_extract_proposal_from_text() |
Fallback: parses JSON from raw LLM output when output_json parse fails |
agent.CrewAIROAWrapper |
Builds Crew per call, runs kickoff(), returns policy dict for Self-Check |
agent.run_claims_roa_cycle |
Mock or Crew path, Self-Check, emits PolicyProposal |
dim.validate_claims_proposal |
dir_core.validate_proposal then order, category, window, amount |
report_generator.py |
Dark-theme HTML audit report (Sample Guide §17) from decision_audit only |
| Context Store | Authoritative order data (source of truth for DIM, not for the agent) |
HTML report
After a successful batch run, run.py writes
results/report_<UTC>_<N>scenarios.html and opens it in the default browser. The report
is self-contained (embedded CSS), uses only bundle.decision_audit.all_events_chronological()
plus registry and context snapshots, and follows §17 section order (summary, authored prose,
empty-state Section 3 charts, trace table, ROA blocks, authoritative orders table, kernel artefacts).
Regenerate without re-running the simulation:
$env:PYTHONPATH="src;samples"
python samples/35_crewai_roa_wrapper/report_generator.py
python samples/35_crewai_roa_wrapper/report_generator.py --simulation-id crewai_claims_batch_001
python samples/35_crewai_roa_wrapper/report_generator.py --output-path samples/35_crewai_roa_wrapper/results/custom.html
Expected Output
======================================================================
35_crewai_roa_wrapper - CrewAI + Ollama + DIR Kernel
======================================================================
Config : config.yaml
LLM : gemma3:4b @ http://localhost:11434
Agent : claims_agent_v1
Crew : Claims Analyst → Decision Maker (sequential, output_json)
DIM : 5-layer validation (RBAC, order, window, category, amount)
Scenarios: 6
[SCENARIO A - Valid: within window & limit]
----------------------------------------------------------------------
Claim: order=ord_001 amount=299.99 EUR cat=electronics
Crew: thinking... done.
Proposal: REFUND ord_001 299.99 EUR
DIM Verdict: ACCEPT
Reason: Validation passed
[SCENARIO B - Amount > 500 EUR (human approval required)]
----------------------------------------------------------------------
Claim: order=ord_002 amount=1200.0 EUR cat=electronics
Crew: thinking... done.
Proposal: REFUND ord_002 1200.0 EUR
DIM Verdict: ESCALATE
Reason: Amount 1200.0 EUR exceeds max_refund_without_escalation (500.0 EUR)...
[SCENARIO C - Outside return window (purchased 2026-01-01)]
----------------------------------------------------------------------
Claim: order=ord_005 amount=500.0 EUR cat=electronics
Crew: thinking... done.
Proposal: REFUND ord_005 500.0 EUR
DIM Verdict: REJECT
Reason: Order ord_005 outside return window...
[SCENARIO D - Prohibited category]
----------------------------------------------------------------------
Claim: order=ord_004 amount=50.0 EUR cat=prohibited_category
Crew: thinking... done.
Proposal: REFUND ord_004 50.0 EUR
DIM Verdict: REJECT
Reason: Category 'prohibited_category' not in allowed_refund_categories...
[SCENARIO E - NL intake: valid claim (ord_001)]
----------------------------------------------------------------------
Input (NL): I bought electronics order ord_001 for 299.99 EUR on 20 February 2026...
Extracted: order=ord_001 amount=299.99 EUR cat=electronics
Crew: thinking... done.
Proposal: REFUND ord_001 299.99 EUR
DIM Verdict: ACCEPT
Reason: Validation passed
[SCENARIO F - NL intake: amount > 500 EUR (ord_002)]
----------------------------------------------------------------------
Input (NL): Order ord_002 - laptop for 1200 EUR, 25 February 2026...
Extracted: order=ord_002 amount=1200.0 EUR cat=electronics
Crew: thinking... done.
Proposal: REFUND ord_002 1200.0 EUR
DIM Verdict: ESCALATE
Reason: Amount 1200.0 EUR exceeds max_refund_without_escalation (500.0 EUR)...
======================================================================
[SUMMARY]
======================================================================
✓ ACCEPT SCENARIO A - Valid: within window & limit
✓ ESCALATE SCENARIO B - Amount > 500 EUR (human approval required)
✓ REJECT SCENARIO C - Outside return window (purchased 2026-01-01)
✓ REJECT SCENARIO D - Prohibited category
✓ ACCEPT SCENARIO E - NL intake: valid claim (ord_001)
✓ ESCALATE SCENARIO F - NL intake: amount > 500 EUR (ord_002)
References
- ROA Manifesto §3 (Responsibility Contract), §4-5 (Explain → Policy → Proposal), §10 (Boxed Intelligence)
- DIR Architectural Pattern §6 (Decision Integrity Module), §5 (Policies as Contracts)
- [Sample 34 - LangChain ROA Wrapper](https://github.com/huka81/decision-intelligence-runtime/blob/main/samples/34_langchain_roa_wrapper/README.md (same pattern, different framework)