Contract Testing

What is contract testing?

In a multi-agent system, agents pass structured data to each other. A contract is the declared shape of that data — the fields, types, and values the downstream agent expects. Without contracts, schema drift is silent. An upstream team renames a field, the LLM adapts, tests pass, and the bug shows up in production as a wrong decision. reagent-flow makes these contracts explicit and testable:

# The security review agent expects this shape from intake
security.assert_handoff_matches(schema={
    "vendor_name": str,
    "data_access": {"contains_customer_pii": bool},
})

If intake renames contains_customer_pii to handles_personal_data, this assertion fails at PR time with the exact field path:

handoff field 'data_access.contains_customer_pii': missing from data

Two types of contracts

Handoff contracts

Validate the data passed between agents via handoff_context:

with reagent_flow.session(
    "security-review",
    parent_trace_id=intake.trace.trace_id,
    handoff_context=vendor_packet,  # the contract target
) as security:
    # ...

security.assert_handoff_matches(schema={...})

Tool output contracts

Validate the data returned by a tool within a single agent:

intake.assert_tool_output_matches("extract_vendor_packet", schema={
    "vendor_name": str,
    "compliance": {"subprocessors": [str]},
})

What contracts catch

Failure mode	Example	Contract that catches it
Renamed field	`contains_customer_pii` becomes `handles_personal_data`	`assert_handoff_matches`
Missing field	`subprocessors` dropped entirely	`assert_handoff_matches`
Wrong type	`retention_days` returns `"30"` instead of `30`	`assert_handoff_matches` or `assert_tool_output_matches`
Extra fields leaking	Internal notes added to the handoff	`assert_no_extra_fields`
Value changed	`vendor_name` mutated between hops	`assert_context_preserved`
Parent link broken	Child session not linked to parent	`assert_handoff_received`

The multi-agent pattern

A typical contract-tested pipeline:

# Agent A — produces data
with reagent_flow.session("agent-a", trace_dir=trace_dir) as a:
    a.log_llm_call(tool_calls=[{"name": "fetch", "arguments": {}}])
    a.log_tool_result("fetch", result={"id": "abc", "value": 42})

# Validate agent A's output shape
a.assert_tool_output_matches("fetch", schema={"id": str, "value": int})

# Agent B — consumes agent A's output
output = a.trace.turns[0].tool_results[0].result
with reagent_flow.session(
    "agent-b",
    trace_dir=trace_dir,
    parent_trace_id=a.trace.trace_id,
    handoff_context=output,
) as b:
    b.log_llm_call(tool_calls=[{"name": "process", "arguments": {}}])
    b.log_tool_result("process", result={"status": "done"})

# Validate the handoff contract
b.assert_handoff_received(a)
b.assert_handoff_matches(schema={"id": str, "value": int})
b.assert_context_preserved({"id": "abc"}, fields=["id"])

Each assertion is a contract. If any upstream change breaks the shape, the test fails with the exact field path — before the change ships.

Contracts are checked at test time using pytest, not at runtime. This keeps production behavior unchanged while catching drift in CI.

Why not guardrails or structured outputs?

See where contract testing fits alongside structured outputs, runtime guardrails, evals, and observability.

Getting Started

Core Concepts

Assertions

Framework Adapters

Advanced

Examples

What is contract testing?

Two types of contracts

Handoff contracts

Tool output contracts

What contracts catch

The multi-agent pattern

Why not guardrails or structured outputs?

Getting Started

Core Concepts

Assertions

Framework Adapters

Advanced

Examples

Documentation Index

​What is contract testing?

​Two types of contracts

​Handoff contracts

​Tool output contracts

​What contracts catch

​The multi-agent pattern

Why not guardrails or structured outputs?

What is contract testing?

Two types of contracts

Handoff contracts

Tool output contracts

What contracts catch

The multi-agent pattern