Skip to content

35 - CrewAI ROA Wrapper

Goal: Demonstrate that Task-Oriented Agents (CrewAI) and Mission-Oriented Agents (ROA) can coexist. Wrap a real CrewAI Crew in an ROA interface, producing structured JSON output (output_json) and converting it to DIR PolicyProposal (Claim) instead of direct execution (Fact). Prove the pattern with a Customer Claims Agent use case where the DIR Kernel rejects or escalates refund proposals based on return window, amount limits, and category boundaries.

Topology: classic (wrapper). Mechanisms: AgentRegistry.handshake, ContextStore sessions, validate_proposal from dir_core plus claims-specific rules in dim.py, idempotency_key on ACCEPT execution, StorageBundle.decision_audit via telemetry.py, scenario batch from scenarios.yaml (schemas.load_scenarios).

DIR alignment: ROA Manifesto §4–5 (Explain → Policy → Self-Check → Proposal; User Space vs. Kernel Space), §10 (Boxed Intelligence), DIR Architectural Pattern §6 (Decision Integrity Module).

Use cases

flowchart TB
    subgraph Actors[" "]
        C[Customer / intake channel]
        O[Operator / batch runner]
    end
    subgraph System["DIR sample 35"]
        R[run.py scenario loop]
        A[agent: CrewAI or mock ROA]
        D[DIM + StorageBundle telemetry]
    end
    C -->|claim text or structured claim| R
    O -->|runs python samples/35.../run.py| R
    R --> A
    A -->|PolicyProposal| D

The Core Concept: Taming the Task-Oriented Crew

CrewAI Crews are collaborative, task-driven agents (e.g., Analyst + Decision Maker). They receive inputs, reason, call tools, and execute. They have no mission, no boundaries, no persistent responsibility. They are optimized for "What can the crew do next?"-not "What is this crew responsible for?" (ROA Manifesto §3).

This creates a fundamental mismatch. In production, Crews:

  • Execute side effects directly (API calls, database writes)
  • Lack authority boundaries (they may act outside their intended scope)
  • Provide no deterministic safety guarantees

The solution: Wrap the Crew in an ROA shell. The Crew retains its reasoning power but is forced to output via structured JSON (output_json / RefundProposalOutput). That output does not execute. It passes intent over "The Wall" to the DIR Kernel Space.

The result: a mission-oriented Crew whose outputs are Claims, not Facts. A Claim becomes a Fact only after the Decision Integrity Module validates it and the Execution Engine runs it (DIR §6-7).


Architecture

Diagram 1 - System Overview: CrewAI wrapped by ROA, processed by DIR

---
config:
  layout: elk
---
flowchart TB
    subgraph CFG["config.yaml"]
        LLMCFG["`llm_defaults<br/>gemma3:4b @ localhost:11434`"]
        CONTRACT["`agent.contract - ClaimsContract<br/>allowed_categories - max_refund - return_window`"]
        CTXSTORE["`context_store - Orders<br/>purchase_date - category - amount`"]
    end

    subgraph US["USER SPACE - Probabilistic - Ollama / Gemma3"]
        subgraph ROA["ROA Wrapper - CrewAIROAWrapper"]
            subgraph CREW["CrewAI Crew - sequential"]
                ANA["`Claims Analyst<br/>LLM text reasoning`"]
                DM["`Decision Maker<br/>output_json = RefundProposalOutput`"]
                ANA -->|eligibility summary| DM
            end
        end
        DM -->|JSON Claim| WALL
    end

    WALL{{"`THE WALL<br/>Claim to PolicyProposal`"}}

    subgraph KS["KERNEL SPACE - Deterministic - DIR"]
        DIM["`dir_core.validate_proposal + dim.py<br/>L1: Schema + RBAC + contract<br/>L2: Order existence<br/>L3: Category boundary<br/>L4: Return window<br/>L5: Amount limit`"]
        ACCEPT["ACCEPT"]
        ESC["`ESCALATE<br/>human review`"]
        REJ["REJECT"]
        DIM --> ACCEPT & ESC & REJ
    end

    WALL --> DIM
    CTXSTORE -.->|order data| DIM
    CONTRACT -.->|boundaries| DIM
    LLMCFG -.->|model / endpoint| CREW

    style US fill:#fffde7,stroke:#f9a825,color:#333
    style KS fill:#e8f5e9,stroke:#388e3c,color:#333
    style ROA fill:#fff9c4,stroke:#f57f17,color:#333
    style CREW fill:#fff3e0,stroke:#e65100,color:#333
    style WALL fill:#37474f,color:#fff
    style ACCEPT fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20
    style ESC fill:#fff9c4,stroke:#f57f17,color:#e65100
    style REJ fill:#ffcdd2,stroke:#c62828,color:#b71c1c

Execution flow

Diagram 2 — end-to-end sequence for a single claim

sequenceDiagram
    actor Caller as run.py
    participant CFG as config.yaml
    participant Agent as agent.run_claims_roa_cycle
    participant Analyst as Claims Analyst (CrewAI / Gemma3)
    participant DM as Decision Maker (CrewAI / Gemma3)
    participant DIM as DIM Validator (Kernel Space)
    participant CS as Context Store (config.yaml)

    Caller ->> CFG: load_yaml_config + load_scenarios()
    CFG -->> Caller: agents, context_store, scenarios.yaml rows

    loop for each scenario (6 rows in scenarios.yaml)
        Note over Caller, Agent: E,F: NL intake via extract_claim (Crew or mock regex)
        Caller ->> Agent: run_claims_roa_cycle(dfid, claim, ...)

        rect rgb(255, 253, 231)
            Note over Agent, DM: USER SPACE - probabilistic

            Agent ->> Analyst: Task: analyze claim eligibility
            Analyst ->> Analyst: Gemma3 reasoning
            Analyst -->> DM: eligibility summary (text)

            DM ->> DM: Gemma3 reasoning (output_json)
            DM -->> Agent: RefundProposalOutput JSON
        end

        Note over Agent, DIM: THE WALL - Claim to PolicyProposal

        rect rgb(232, 245, 233)
            Note over DIM, CS: KERNEL SPACE - deterministic

            Caller ->> DIM: validate_claims_proposal(proposal, dim_ctx, contract, dim_contract)
            DIM ->> CS: lookup order (purchase_date, category)
            CS -->> DIM: order record

            alt L1-L4 pass AND amount <= 500 EUR
                DIM -->> Caller: ACCEPT
            else L1-L4 pass AND amount > 500 EUR
                DIM -->> Caller: ESCALATE
            else L2 fail - order not found
                DIM -->> Caller: REJECT (order unknown)
            else L3 fail - prohibited category
                DIM -->> Caller: REJECT (category boundary)
            else L4 fail - outside return window
                DIM -->> Caller: REJECT (return window)
            end
        end
    end

Diagram 3 - Test Scenarios: 6 claims through the DIM validation pipeline

---
config:
  layout: elk
---
flowchart TD
    subgraph SCENARIOS["scenarios.yaml"]
        SA["`A - claim dict<br/>ord_001 - 299.99 EUR`"]
        SB["`B - claim dict<br/>ord_002 - 1200 EUR`"]
        SC["`C - claim dict<br/>ord_005 - 500 EUR`"]
        SD["`D - claim dict<br/>ord_004 - 50 EUR`"]
        SE["`E - claim_text NL<br/>ord_001 - 299.99 EUR`"]
        SF["`F - claim_text NL<br/>ord_002 - 1200 EUR`"]
    end

    subgraph EXTRACT["NL intake (E, F only)"]
        EX["`extract_claim_from_text()<br/>LLM → structured claim`"]
    end

    subgraph CREW_US["CrewAI Crew - User Space - Gemma3"]
        PROP["`Analyst to Decision Maker<br/>output: REFUND proposal JSON`"]
    end

    SA & SB & SC & SD --> PROP
    SE & SF --> EX
    EX --> PROP

    WALL{{"THE WALL"}}
    PROP --> WALL

    subgraph DIM_KS["DIM - Kernel Space - validate_claims_proposal + dir_core"]
        L1["L1 Schema + RBAC - pass"]
        L2["L2 Order exists - pass"]
        L3{"`L3 Category<br/>in allowed list?`"}
        L4{"`L4 Purchase date<br/>within 14 days?`"}
        L5{"`L5 Amount<br/>max 500 EUR?`"}
    end

    WALL --> L1 --> L2 --> L3

    L3 -->|electronics / clothing / home| L4
    L3 -->|prohibited_category - D| RD

    L4 -->|within window - A, B, E, F| L5
    L4 -->|2026-01-01 expired - C| RC

    L5 -->|299.99 - A, E| RA
    L5 -->|1200.00 - B, F| RB

    RA["`**ACCEPT**<br/>A, E`"]
    RB["`**ESCALATE**<br/>B, F - human review`"]
    RC["`**REJECT**<br/>C - return window expired`"]
    RD["`**REJECT**<br/>D - category not allowed`"]

    style SCENARIOS fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
    style EXTRACT fill:#e1f5fe,stroke:#0288d1,color:#01579b
    style CREW_US fill:#fffde7,stroke:#f9a825,color:#333
    style DIM_KS fill:#e8f5e9,stroke:#388e3c,color:#333
    style WALL fill:#37474f,color:#fff
    style RA fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20
    style RB fill:#fff9c4,stroke:#f57f17,color:#e65100
    style RC fill:#ffcdd2,stroke:#c62828,color:#b71c1c
    style RD fill:#ffcdd2,stroke:#c62828,color:#b71c1c

Key Difference from a Naive Crew

Naked CrewAI Crew ROA-Wrapped Crew
Actions Any tool, any API, any DB Only structured JSON (output_json)
Enforcement None (trust the LLM) Deterministic DIM in Kernel Space
Output Side effects (Facts) Proposals (Claims)
Authority Unbounded ClaimsContract boundaries

The Claims Agent Scenario

Analyst vs DIM - Who Validates What?

The Claims Analyst (LLM) analyzes only the claim data provided in the scenario - it has no access to Context Store. It can reason about boundaries (categories, limits) from the prompt text, but it may err (e.g., wrong return-window assessment). The DIM (Kernel Space) is the source of truth: it reads purchase_date and category from Context Store and enforces rules deterministically. If the Analyst says "OK" but the data contradicts - DIM rejects. This separation is intentional: User Space proposes, Kernel Space decides.

Mission

Process customer claims fairly, within policy boundaries, without exceeding authority or escalating unnecessarily.

Responsibility Contract (ClaimsContract)

Field Description
allowed_refund_categories Product categories agent may propose refunds for (e.g., electronics, clothing)
max_refund_without_escalation EUR threshold-above requires human approval (ESCALATE)
return_window_days Max days from purchase for automatic eligibility

Natural Language Intake (Scenarios E, F)

In production, customers write free-form text in English (e.g. "I bought ord_001 for 299 EUR, defective product"), not JSON. The LLM extracts structured claim data (order_id, amount_eur, category, reason) in a single call before the Crew processes it. This justifies the LLM: deterministic rules alone cannot parse natural language.

Scenarios

Scenario Input DIM Verdict Reason
A Valid claim: within return window, amount ≤ 500 EUR ACCEPT All criteria met
B Valid claim: amount > 500 EUR ESCALATE Human approval required
C Claim outside return window (purchased 2026-01-01) REJECT Outside return_window_days
D Claim for prohibited category REJECT Category not in allowed_refund_categories
E NL text: valid claim (ord_001, 299.99 EUR) ACCEPT LLM extracts → DIM validates
F NL text: amount > 500 EUR (ord_002, 1200 EUR) ESCALATE LLM extracts → DIM escalates

Layout (Sample Guide §3)

File Role
run.py Bootstrap, handshake, ContextStore, scenario loop, DIM, telemetry, idempotent execution
config.yaml database, llm_defaults, simulation, agents[], context_store
scenarios.yaml Scenario rows with context.claim or context.claim_text and expected
schemas.py load_scenarios, parse_llm_json, CrewConfig, contract payload helper
contracts.py ClaimsContract built from YAML + ResponsibilityContract
agent.py CrewAI Explain→Policy path, mock deterministic path, Self-Check, PolicyProposal
dim.py validate_claims_proposal (wraps dir_core.validate_proposal + claims rules)
telemetry.py SIMULATION_*, AGENT_DECISION, CLAIM_REFUND_EXECUTED, self-check failures
mocks/llm_mock_strategy.py make_mock_strategy for setup_environment when mock is selected

Configuration

Runtime settings are split between config.yaml (persistence, LLM defaults, agents, authoritative context_store) and scenarios.yaml (batch inputs and expected DIM verdicts). context_store.orders[*].purchase_date must fall within claims_bounds.return_window_days of the wall-clock date you run against, or ACCEPT scenarios will see REJECT from the return-window rule.

database:
  provider: sqlite
  db_path: "data/crewai_roa.db"

llm_defaults:
  model: "gemma3:4b"
  base_url: "http://localhost:11434"
  temperature: 0.2

simulation:
  run_id: "crewai_claims_batch_001"

agents:
  - agent_id: "claims_agent_v1"
    contract:
      role: EXECUTOR
      authorized_instruments: [electronics, clothing, home]
      allowed_policy_types: [REFUND, REPLACE, ESCALATE]
      escalate_on_uncertainty: 0.7
      max_drawdown_limit: 0.05
      wake_up_threshold_pct: 0.5
      parent_agent_id: null
    claims_bounds:
      max_refund_without_escalation: 500.0
      return_window_days: 14

context_store:
  orders:
    ord_001: { purchase_date: "...", category: electronics, amount: 299.99 }
Section Purpose
database SQLite path anchored next to config.yaml via setup_environment
llm_defaults Default Ollama endpoint; overridden by OLLAMA_* env vars when set
simulation.run_id simulation_id in every telemetry details payload
agents[].contract Canonical ResponsibilityContract fields for YamlContractProvider
agents[].claims_bounds Claims-only limits read into ClaimsContract
context_store Authoritative orders for DIM (not injected into the Crew prompt)

How to run

Mock (no network, no Ollama, no CrewAI LLM calls)

Deterministic claim→proposal path and regex NL extraction for scenarios E–F.

pip install -e .
$env:PYTHONPATH="src;samples"; $env:USE_MOCK_LLM="1"   # PowerShell
python samples/35_crewai_roa_wrapper/run.py

Ollama + CrewAI (full User Space)

pip install -e ".[crewai]"
ollama serve
ollama pull gemma3:4b
$env:PYTHONPATH="src;samples"
python samples/35_crewai_roa_wrapper/run.py

Unset USE_MOCK_LLM (or set it to 0) so configured_live_llm_is_reachable can succeed; otherwise the sample stays on the mock path.

Gemini

This sample wires CrewAI to the OpenAI-compatible Ollama endpoint. Gemini is not configured here; use mock or Ollama as above.

Env var overrides (Ollama):

$env:OLLAMA_BASE_URL = "http://localhost:11434"
$env:OLLAMA_MODEL    = "gemma3:4b"

Database storage

Events are written only through bundle.decision_audit (see telemetry.py). Typical event values for this sample: SIMULATION_START, SIMULATION_END, AGENT_DECISION, CLAIM_REFUND_EXECUTED, CLAIMS_SELF_CHECK_FAILED.

Group a run by simulation_id stored inside JSON details (Sample Guide §9.4):

-- SQLite
SELECT dfid, event, detail_json
FROM decision_audit_events
WHERE json_extract(detail_json, '$.simulation_id') = 'crewai_claims_batch_001'
ORDER BY id;
-- PostgreSQL
SELECT dfid, event, detail_json
FROM decision_audit_events
WHERE detail_json->>'simulation_id' = 'crewai_claims_batch_001'
ORDER BY id;

Why provider="openai"? CrewAI's LLM class routes provider="openai" to the native OpenAI Python SDK. Ollama exposes an OpenAI-compatible API at /v1, so no LiteLLM or extra dependencies needed:

LLM(model="gemma3:4b", provider="openai",
    base_url="http://localhost:11434/v1", api_key="ollama")


How the Interception Works (output_json instead of tool-calling)

Gemma3 (and most local Ollama models) do not support OpenAI-style function calling. The original Submit_Policy_Proposal tool approach requires the model to call functions - Ollama returns HTTP 400 for such requests.

Solution: output_json on the Decision Maker's Task.

CrewAI's output_json parameter instructs the LLM to format its entire response as JSON matching a Pydantic schema, then validates and parses it automatically:

class RefundProposalOutput(BaseModel):
    action: str       # always "REFUND"
    order_id: str
    amount_eur: float
    category: str
    reason: str

decide_task = Task(
    description="Based on the analyst's findings, produce a refund proposal...",
    output_json=RefundProposalOutput,  # ← no tool-calling needed
    agent=decision_maker,              # ← no tools=[] on the agent
)

result = crew.kickoff()
data = result.json_dict  # parsed + validated dict
proposal = PolicyProposal(..., params=data)

Architecturally equivalent: the LLM still produces a Claim (JSON proposal), which crosses "The Wall" to the DIM for deterministic validation before any Fact (execution) occurs. The boundary holds.


Key Components

Component Purpose
ClaimsContract Defines authority boundaries (categories, amount limit, return window)
extract_claim_from_text() NL intake: extracts structured claim from customer text (single LLM call)
ClaimExtractionOutput Pydantic schema for NL extraction output
RefundProposalOutput Pydantic schema for output_json - structured proposal from Decision Maker
_extract_proposal_from_text() Fallback: parses JSON from raw LLM output when output_json parse fails
agent.CrewAIROAWrapper Builds Crew per call, runs kickoff(), returns policy dict for Self-Check
agent.run_claims_roa_cycle Mock or Crew path, Self-Check, emits PolicyProposal
dim.validate_claims_proposal dir_core.validate_proposal then order, category, window, amount
report_generator.py Dark-theme HTML audit report (Sample Guide §17) from decision_audit only
Context Store Authoritative order data (source of truth for DIM, not for the agent)

HTML report

After a successful batch run, run.py writes results/report_<UTC>_<N>scenarios.html and opens it in the default browser. The report is self-contained (embedded CSS), uses only bundle.decision_audit.all_events_chronological() plus registry and context snapshots, and follows §17 section order (summary, authored prose, empty-state Section 3 charts, trace table, ROA blocks, authoritative orders table, kernel artefacts).

Regenerate without re-running the simulation:

$env:PYTHONPATH="src;samples"
python samples/35_crewai_roa_wrapper/report_generator.py
python samples/35_crewai_roa_wrapper/report_generator.py --simulation-id crewai_claims_batch_001
python samples/35_crewai_roa_wrapper/report_generator.py --output-path samples/35_crewai_roa_wrapper/results/custom.html

Expected Output

======================================================================
35_crewai_roa_wrapper  -  CrewAI + Ollama + DIR Kernel
======================================================================
  Config : config.yaml
  LLM    : gemma3:4b @ http://localhost:11434
  Agent  : claims_agent_v1
  Crew   : Claims Analyst → Decision Maker (sequential, output_json)
  DIM    : 5-layer validation (RBAC, order, window, category, amount)
  Scenarios: 6

[SCENARIO A - Valid: within window & limit]
----------------------------------------------------------------------
  Claim:  order=ord_001  amount=299.99 EUR  cat=electronics
  Crew:   thinking... done.
  Proposal:   REFUND ord_001 299.99 EUR
  DIM Verdict: ACCEPT
  Reason:      Validation passed

[SCENARIO B - Amount > 500 EUR (human approval required)]
----------------------------------------------------------------------
  Claim:  order=ord_002  amount=1200.0 EUR  cat=electronics
  Crew:   thinking... done.
  Proposal:   REFUND ord_002 1200.0 EUR
  DIM Verdict: ESCALATE
  Reason:      Amount 1200.0 EUR exceeds max_refund_without_escalation (500.0 EUR)...

[SCENARIO C - Outside return window (purchased 2026-01-01)]
----------------------------------------------------------------------
  Claim:  order=ord_005  amount=500.0 EUR  cat=electronics
  Crew:   thinking... done.
  Proposal:   REFUND ord_005 500.0 EUR
  DIM Verdict: REJECT
  Reason:      Order ord_005 outside return window...

[SCENARIO D - Prohibited category]
----------------------------------------------------------------------
  Claim:  order=ord_004  amount=50.0 EUR  cat=prohibited_category
  Crew:   thinking... done.
  Proposal:   REFUND ord_004 50.0 EUR
  DIM Verdict: REJECT
  Reason:      Category 'prohibited_category' not in allowed_refund_categories...

[SCENARIO E - NL intake: valid claim (ord_001)]
----------------------------------------------------------------------
  Input (NL): I bought electronics order ord_001 for 299.99 EUR on 20 February 2026...
  Extracted: order=ord_001  amount=299.99 EUR  cat=electronics
  Crew:   thinking... done.
  Proposal:   REFUND ord_001 299.99 EUR
  DIM Verdict: ACCEPT
  Reason:      Validation passed

[SCENARIO F - NL intake: amount > 500 EUR (ord_002)]
----------------------------------------------------------------------
  Input (NL): Order ord_002 - laptop for 1200 EUR, 25 February 2026...
  Extracted: order=ord_002  amount=1200.0 EUR  cat=electronics
  Crew:   thinking... done.
  Proposal:   REFUND ord_002 1200.0 EUR
  DIM Verdict: ESCALATE
  Reason:      Amount 1200.0 EUR exceeds max_refund_without_escalation (500.0 EUR)...

======================================================================
[SUMMARY]
======================================================================
  ✓ ACCEPT      SCENARIO A - Valid: within window & limit
  ✓ ESCALATE    SCENARIO B - Amount > 500 EUR (human approval required)
  ✓ REJECT      SCENARIO C - Outside return window (purchased 2026-01-01)
  ✓ REJECT      SCENARIO D - Prohibited category
  ✓ ACCEPT      SCENARIO E - NL intake: valid claim (ord_001)
  ✓ ESCALATE    SCENARIO F - NL intake: amount > 500 EUR (ord_002)

References

  • ROA Manifesto §3 (Responsibility Contract), §4-5 (Explain → Policy → Proposal), §10 (Boxed Intelligence)
  • DIR Architectural Pattern §6 (Decision Integrity Module), §5 (Policies as Contracts)
  • [Sample 34 - LangChain ROA Wrapper](https://github.com/huka81/decision-intelligence-runtime/blob/main/samples/34_langchain_roa_wrapper/README.md (same pattern, different framework)