0055: Conformance Adapter Capability¶
- Status: Accepted
- Author: Chris Colinsky
- Created: 2026-06-03
- Accepted: 2026-06-04
- Targets: spec/conformance-adapter/spec.md (creates — new capability spec ratifying the existing fixture YAML schema, directive vocabulary, harness primitives, and adapter responsibility model); docs/conformance.md (creates — readable explainer / tutorial alongside governance.md); spec/graph-engine/conformance/README.md (slim down — its v0 informal schema content moves into the new capability spec; the README becomes a one-line breadcrumb pointing at the capability spec); docs/governance.md §"Conformance tests" (extends to cross-reference the new capability spec as the normative source).
- Related: 0001 (graph-engine — first capability with fixtures; source of the v0 informal YAML schema embedded in
spec/graph-engine/conformance/README.md), 0020 (sessions — introduced the multi-invocationinvocations:list directive for cross-invoke fixtures), 0048 (introducedaugment_metadata/augment_metadata_from_field/capture_invocation_metadata_intofor the read-symmetric invocation metadata read API), 0054 (introducedbehavior: accumulate,invoke_drain_events_for,node_drain_summaries,node_accumulator_snapshots,final_accumulator_statefor the per-invocation drain primitive), 0021 (introducedsuspend_with_descriptor,conditional_suspend,wrap_with_middleware,resume_invocation,signal_payload,descriptor/suspending_node/metadata_includesassertion shapes for the suspension primitive) - Supersedes:
Summary¶
Create the conformance-adapter capability spec. Documents the language-agnostic fixture
system that every OA implementation builds against: the YAML fixture schema, the directive
vocabulary that has accreted across proposals 0001-0054, the harness primitives implementations
MUST provide, the assertion shapes adapters MUST honor, and the responsibility model for
language adapters that ingest fixtures and exercise them against a native implementation. Adds
a docs-side explainer (docs/conformance.md) parallel to docs/governance.md and
docs/openarmature.md.
The capability v1 is ratifying — it documents the directive vocabulary as it currently
exists across the conformance directories. No directive shapes change. Future proposals that
introduce new directives continue the established pattern (each proposal's Conformance test
impact section lists the new directives the proposal adds), but the new directives now have a
named home — implementations look up directive semantics in
spec/conformance-adapter/spec.md rather than reverse-engineering them from each proposal's
fixture additions.
The capability is spec'd as a versioned capability (not a top-level doc). Implementations declare conformance-adapter version alongside spec_version in their package metadata; the SemVer-pinned vocabulary is the contract a language adapter targets.
Motivation¶
The fixture system has outgrown its v0 documentation. The original conformance README at
spec/graph-engine/conformance/README.md ships an "informal v0" YAML schema covering the
basic graph-engine shape (state / entry / nodes / edges / expected). That schema
was sufficient when graph-engine was the only capability with fixtures. Since then, every
proposal that introduced a new behavioral surface has extended the YAML format with new
directives:
- Proposal 0005 (parallel fan-out):
fan_outnode directive,fan_out_indexevent field,subgraphdeclaration form. - Proposal 0011 (parallel branches):
parallel_branchesnode directive,branch_nameevent field,subgraphsmulti-declaration form,branchesmapping. - Proposal 0020 (sessions):
session_storeregistration,session_idper-invocation,invocations:list for multi-invoke fixtures,loaded_session_state/saved_session_assertionsassertion shapes. - Proposal 0008 / 0010 / 0030 (checkpointing + drain):
checkpointerregistration,drain: {timeout_seconds}invoke directive,drain_summaryassertion shape,sleep_ms_per_eventobserver-pacing directive,behavior: recordobserver behavior. - Proposal 0048 (read-symmetric invocation metadata + queryable observer pattern):
augment_metadata/augment_metadata_from_field/capture_invocation_metadata_intonode directives,caller_metadataper-invocation,caller_invocation_id. - Proposal 0054 (per-invocation event drain):
behavior: accumulateobserver behavior,invoke_drain_events_fornode directive,node_drain_summaries/node_accumulator_snapshots/node_accumulator_snapshot_invariants/final_accumulator_stateassertion shapes. - Proposal 0021 (suspension):
suspend_with_descriptor/conditional_suspend/pre_next_calls_suspend_with_descriptornode directives,wrap_with_middlewaredirective,resume_invocation/signal_payloadper-invocation,descriptor/suspending_node/metadata_includes/node_drain_summaries/outcome/error.categoryassertion shapes.
The vocabulary has roughly doubled in size since v0 and is split across the Conformance test impact sections of more than two dozen subsequent proposals. No single document collects it; no version is declared; no normative contract obligates implementations to honor each directive's documented semantics.
The downstream consequence is that the reference Python implementation has become the de
facto source of truth for what each directive means at runtime. A future
openarmature-typescript adapter would have to reverse-engineer the directive semantics from
the Python implementation rather than from spec text — an inversion of the spec-repo-as-truth
model the rest of OA is built on.
Cross-language consistency is the load-bearing payoff. OA's whole rationale for shipping
as a spec repo separate from any language implementation is that future language ports build
against the spec, not against the Python implementation. The conformance fixtures are the
behavioral truth for cross-language consistency (per docs/governance.md §"Multi-language
consistency" — "APIs MAY differ in syntactic shape; behavior MUST match conformance tests").
For that promise to hold, the YAML schema + directive vocabulary + harness primitives + assertion
shapes MUST themselves be specified, not just the behaviors they exercise.
This proposal does the documentation work that should have happened incrementally but was deferred at every step. Each new directive landed via a proposal that focused on the behavior being tested; the directive itself was treated as test-infrastructure detail. The accumulated state is now substantial enough to warrant its own capability spec — and the ratification gives future proposals a clear home for new directives (each new proposal's Conformance test impact section adds to the conformance-adapter capability's directive vocabulary, the same way new pipeline-utilities §6.x middleware land in pipeline-utilities or new observability §5.x attribute sets land in observability).
Detailed design¶
1. Capability spec scope¶
The new spec/conformance-adapter/spec.md defines:
- Fixture file format. Per-capability
conformance/directories underspec/<capability>/. Each fixture is a numberedNNN-name.yaml+NNN-name.mdpair (the YAML carries the declarative test; the Markdown carries the prose description of what is being tested, which spec sections are exercised, and what passes vs fails). - YAML schema. The full document shape, including the single-case form (
state:/entry:/nodes:/edges:/initial_state:/expected:at top level), the multi-case form (cases:list), and the multi-invocation form (invocations:list with per-invocationinitial_state/resume_invocation/signal_payload/session_id/expected). - Directive vocabulary. Every directive currently in use across the seven capabilities' conformance directories, organized by category (node behavior, state / schema, edge, composition, observer, persistence, invocation-shape, expected-outcome, invariant). Each directive entry specifies: name, where it appears in the YAML structure, parameters and their types, the runtime behavior the adapter MUST honor, and the spec section(s) the directive exists to exercise.
- Harness primitives. What language adapters MUST provide to execute the fixtures —
in-memory observer types (recording, accumulating, slow / paced, OTel-emitting, Langfuse-
emitting, raising), in-memory persistence backends (session store, checkpointer), suspend /
resume primitive wiring (so a fixture's
suspend_with_descriptordirective translates to a realsuspend()call from inside a synthetic node body), drain primitive wiring (sodrain: {timeout_seconds}andinvoke_drain_events_fortranslate to real drain calls). - Assertion shapes. The expected-outcome surface — exact-equality assertions on final state and execution order; invariant assertions for nondeterministic-ordering cases (counts, identity-tuple uniqueness, attribute presence, ordering relations between events that are ordered even when individual event positions are not); structured assertions on observer events, OTel spans, Langfuse traces, drain summaries, accumulator snapshots, session records.
- Nondeterminism handling. Which orderings are observable but not uniquely determined
(fan-out instance scheduling, parallel-branches branch scheduling, observer event dispatch
within a phase) and how fixtures assert on those cases (
observer_event_invariantsrather thanobserver_events, count-based or identity-tuple-based assertions rather than enumerated event lists). - Adapter responsibility model. What a language implementation does to ship a
conformance-passing adapter — fixture discovery (walking
spec/*/conformance/for*.yamlfiles), per-case parsing into the implementation's native graph-construction calls, per-case execution against the implementation's runtime, per-case assertion via the language's idiomatic test framework (pytest for Python, vitest for TypeScript, etc.). The adapter is implementation-private; the fixtures are spec-public. - Versioning. The conformance-adapter capability follows whole-spec SemVer like every other capability. Each proposal that adds directives bumps the capability version. The capability ships at v0.X.0 (matching whatever the spec version is at acceptance time).
- Errors at the adapter layer.
fixture_directive_unknown(an adapter that doesn't recognize a directive MUST raise rather than silently skip),fixture_schema_invalid(an adapter that finds a structurally-broken fixture MUST raise rather than infer a default), and a handful of other adapter-layer error categories.
2. Spec section structure (12 sections, matching the established capability-spec template)¶
- §1 Purpose — frames the conformance-adapter as the spec-side authority for the fixture
system; cross-references
docs/governance.md§"Conformance tests" (overview),docs/conformance.md(explainer), and the newspec/conformance-adapter/spec.md(normative). - §2 Concepts — Fixture, Adapter, Directive, Harness primitive, Assertion shape, Invariant, Case, Invocation.
- §3 Fixture file format — directory layout,
NNN-name.{yaml,md}pair convention, numbering rules, README discovery (no required README per directory; the capability spec is the schema reference). Per-directory harness notes via fixture-header comments. The capability spec is the home for general directives that span capabilities; fixture- header comments MAY supplement with per-directory-specific harness notes when a capability's fixtures share a specialized contract that doesn't generalize. Worked example:spec/observability/conformance/001-otel-basic-trace.yamlopens with a comment block documenting the per-capability harness contract (in-memory OTelSpanExporter, privateTracerProviderper spec §6 isolation,caller_correlation_id/detached_subgraphs/mock_llm/ etc. optional config blocks, the<uuid>/<any-string>/<trace_id_X>placeholder syntax). That header is normative for the observability fixture suite even though it isn't part of the capability spec. The conformance-adapter spec describes the general directive surface; per-directory fixture-header notes describe specialized harness wiring that only applies inside that directory. - §4 Fixture YAML schema — top-level shape (single-case form),
cases:multi-case form,invocations:multi-invocation form, version pinning (the conformance-adapter version a fixture targets MAY be declared in aconformance_version:top-level key; defaults to whatever the spec version is at the time the fixture was authored). - §5 Directive vocabulary — enumerated by category (sub-sections §5.1-§5.9):
- §5.1 Node behavior directives —
update,update_pure,update_from_field,raises,suspend_with_descriptor,conditional_suspend,pre_next_calls_suspend_with_descriptor,invoke_drain_events_for,wrap_with_middleware,augment_metadata,augment_metadata_from_field,capture_invocation_metadata_into. - §5.2 State / schema directives —
state.fieldsshape (type, default, reducer),initial_state. - §5.3 Edge directives — static form, conditional form, conditional with
if_field/equals/then/else. - §5.4 Composition directives —
subgraph(single inline),subgraphs(named mapping),fan_out(subgraph, items_field, item_field, collect_field, target_field, error_policy, concurrency),parallel_branches(branches mapping with subgraph / outputs, error_policy). - §5.5 Observer / observability directives —
observers[]withbehaviorenum (record,accumulate,raise),sleep_ms_per_event(constant OR{first_invocation, subsequent_invocations}shape),attach(graph/invocation),target(outer/inner/ specific subgraph or node name),phasesfilter,caller_metadata,caller_invocation_id. OTel and Langfuse emission are NOT observer behaviors — they're harness primitives provided by the capability's host directory; see §6. - §5.6 Persistence directives —
session_store(e.g.,in_memory),checkpointer(e.g.,in_memory),loaded_session_state,saved_session_assertions,checkpointer_assertions. - §5.7 Invocation-shape directives — single-invocation top-level (
invoke:/ directinitial_state:+expected:); multi-invocationinvocations[]with per-entryname,session_id,correlation_id,caller_invocation_id,resume_invocation,signal_payload,caller_metadata,initial_state,drain,expected. - §5.8 Expected-outcome directives —
final_state,execution_order,outcome(completed / errored / suspended),error.category,expected_error.category/expected_error.raised_from,drain_summary,observer_events,observer_event_invariants,otel_spans,langfuse_*,node_drain_summaries,node_accumulator_snapshots,node_accumulator_snapshot_invariants,final_accumulator_state,saved_session_assertions,descriptor,suspending_node,metadata_includes,suspended_state,final_state_at_error. - §5.9 Invariant assertions —
invariants:block (name-keyed boolean predicates the adapter verifies as additional checks beyond exact-equality assertions; used for nondeterministic-ordering cases and for stating load-bearing properties verbatim, e.g.,drain_returned_within_timeout: true). - §6 Harness primitives — what implementations MUST provide to satisfy each directive class. Organized by primitive type:
- In-memory observers (record, accumulate, raise, slow/paced via
sleep_ms_per_event). - In-memory session store + checkpointer (single-process, ephemeral, mirroring the bundled-minimum pattern from sessions §5.5 and pipeline-utilities §10.13).
- OTel collector capture (an in-memory OTel exporter that records emitted spans for structured assertion).
- Langfuse mock (an in-memory Langfuse client wrapper that records emitted traces/observations for structured assertion).
- Suspend / resume wiring (the
suspend_with_descriptordirective on a node compiles to a realsuspend()call from the synthetic node body; the resume invocation calls the realinvoke(resume_invocation=..., signal_payload=...)API). - Drain wiring (the
drainandinvoke_drain_events_fordirectives invoke the realdrain()/drain_events_for()operations on the compiled graph). - Middleware wiring (the
wrap_with_middlewaredirective wraps the node with a real middleware that runs pre / post code recording markers into a state log field). - §7 Nondeterminism handling — enumeration of which orderings are observable but not determined by the spec (fan-out instance scheduling, parallel-branches branch scheduling, observer event dispatch within one phase), and the rule that fixtures assert on invariants (counts, identity-tuple uniqueness, presence of branch_name / fan_out_index values) rather than exact event sequences in those cases. References graph-engine §3 (concurrency exception) and §5 (determinism).
- §8 Adapter responsibility — discovery (walking
spec/*/conformance/for*.yaml), parsing (per-case translation into native graph-construction calls), execution, assertion via the language's idiomatic test framework. Version pinning rule (the adapter declares which conformance-adapter version it targets via the implementation's package metadata). - §9 Errors — adapter-layer error categories:
fixture_directive_unknown(adapter MUST raise on unrecognized directive; MUST NOT silently skip),fixture_schema_invalid(structurally broken fixture),harness_primitive_missing(a fixture references a harness primitive the adapter doesn't provide). - §10 Determinism — the adapter itself is a control-flow layer; it does not perturb the determinism of the implementation it exercises (mirrors the same control-flow-layer-doesn't-perturb-determinism rule the harness contract establishes per proposal 0022 when its capability spec lands).
- §11 Cross-spec touchpoints — references every capability whose fixtures contribute directives. The directive vocabulary §5 is the authoritative directive enumeration; this section is a navigational cross-reference.
- §12 Out of scope — explicit non-goals: specific language test-runner integration (pytest / vitest / etc. are implementation choices); fixture-authoring tooling (linters, generators, scaffolders); schema-validation tooling for the YAML itself; performance benchmarking or comparative-conformance reporting between implementations.
3. Docs side¶
A new docs/conformance.md ships alongside docs/governance.md and docs/openarmature.md
as a readable explainer / tutorial. Distinct from the normative spec/conformance-adapter/spec.md:
- Spec text (normative): exact directive shapes, error categories, MUST/SHOULD/MAY language, version pin.
- Docs page (informative): why the fixture system exists, end-to-end worked example (here's a real fixture, here's what the Python adapter does with it, here's the pytest output), how an implementer ships a conformance-passing adapter for a new language, the "every proposal that introduces new fixtures adds to the conformance-adapter capability" rule, cross-link to the formal capability spec for normative details.
The docs page lives in docs/ (alongside governance.md), NOT in docs/capabilities/
(which is reserved for symlinks to spec capability files). Linked from the mkdocs nav at
top level, between "Governance" and "Releasing" (or equivalent slot).
4. Existing-content migration¶
spec/graph-engine/conformance/README.md— its current v0 informal schema content moves into the new capability spec's §4. The README slims to a one-line breadcrumb pointing atspec/conformance-adapter/spec.md.docs/governance.md§"Conformance tests" — extends to cross-reference the new capability spec as the normative source. The §"Conformance tests" prose stays (overview is useful navigation); a paragraph appended pointing at the capability spec for directive details and adapter requirements.
5. Versioning¶
MINOR bump (pre-1.0). Documentary and additive only:
- New capability spec at
spec/conformance-adapter/spec.mdratifying the existing directive vocabulary. No behavior change — every existing fixture continues to pass unchanged under any conforming adapter. - New docs page at
docs/conformance.md. - One symlink under
docs/capabilities/conformance-adapter.md→../../spec/conformance-adapter/spec.mdper the established docs-side capability-page pattern. spec/graph-engine/conformance/README.mdslimmed (content moves; lookup pointer remains).docs/governance.md§"Conformance tests" extended with a cross-ref paragraph.
The conformance-adapter capability ships at v0.X.0 (matching the spec version at acceptance time). Future proposals that add new directives bump the capability's effective version in the same way other capabilities track versions (the proposal's Conformance test impact section enumerates the new directives; the spec's §5 sub-sections gain entries; the History line records the addition).
Conformance test impact¶
No conformance fixtures for the conformance-adapter capability itself. The capability defines the YAML schema and directive vocabulary; its own correctness is validated by every OTHER capability's fixtures passing. A meta-conformance suite that exercises the directive vocabulary in isolation would be redundant — every per-capability fixture already exercises the directive vocabulary; meta-coverage adds no orthogonal verification beyond what the existing per-capability fixture suites already provide.
Existing fixtures unchanged. The ratification is descriptive — no fixture YAML files are modified by this proposal. Implementations passing the conformance suite today continue to pass it under the new capability spec.
Alternatives considered¶
1. Document the directive vocabulary in docs/governance.md (no separate capability).
Extend the existing §"Conformance tests" section with the directive enumeration; skip
creating a new capability.
Rejected: governance.md is meta-content (process rules, versioning policy, multi-language consistency rules); the directive vocabulary is normative behavioral content (each directive specifies runtime behavior an adapter MUST honor). The two have different stability profiles (governance changes are rare; directive vocabulary changes every proposal). Mixing them makes it harder to version each appropriately and harder for adapter authors to find the authoritative directive list.
2. Ship as a top-level spec doc (spec/CONFORMANCE.md) rather than a capability.
Treat the fixture system as repo-level infrastructure outside the capability framework.
Rejected: the directive vocabulary IS specifying behavior that adapters must implement; it fits the capability model exactly. The other capabilities (graph-engine, sessions, etc.) define behaviors of the framework; the conformance-adapter capability defines behaviors of the language adapters that exercise the framework. Both are behavioral specifications targeting implementations; both deserve the same versioning, history, and conformance-via-Accept discipline.
3. Redesign the directive shapes to a cleaner v1 schema before ratifying. Take this
proposal as a chance to clean up the directives that accreted ad-hoc (e.g., consolidate
overlapping shapes like update vs update_pure vs update_from_field into a single
update directive with parameters; rationalize the assertion-shape naming conventions).
Rejected for v1, deferred to a possible v2: each existing directive landed via a proposal that exercised concrete behavioral coverage; the directive shape was approved alongside the fixture additions. Redesigning would invalidate that work and require re-approving every existing fixture against the new shape — significant churn for marginal cleanliness gain. The ratification approach freezes the current vocabulary as v0.X.0 and lets future proposals either continue the established pattern OR open a cleanup proposal targeting specific directive consolidations once the v0.X.0 surface stabilizes.
4. Defer the proposal until cross-language demand surfaces (when openarmature-typescript starts). Argue that the Python adapter remains the de facto source of truth until a second language adapter actually needs to be built.
Rejected: the documentation gap exists now; deferring it makes the eventual TS adapter work harder (more accreted directives to reverse-engineer) and pushes the cross-impl consistency burden onto the Python implementer (who would have to document the directives ad-hoc for the TS implementer). Documenting now, while the vocabulary is fresh in mind, is cheaper than documenting later.
Open questions¶
- Whether the conformance-adapter capability includes the OTel and Langfuse mock harness primitives in its v1 surface, or leaves them to a follow-on proposal. The current conformance directories include observability fixtures that exercise OTel span emission and Langfuse trace/observation emission; the Python adapter ships in-memory mocks for both. Including them in v1 ratifies the existing pattern; deferring them shrinks the v1 surface but leaves the existing mocks under-specified. Lean: include in v1 (the Python implementation already has them; documenting matches reality).
- Whether the
conformance_versionper-fixture pin is enforced at parse time. A fixture authored against conformance-adapter v0.5.0 declaresconformance_version: "0.5.0". An adapter targeting conformance-adapter v0.7.0 SHOULD accept the fixture (additive directive vocabulary is backwards-compatible). What happens when the adapter targets v0.5.0 and the fixture declares v0.7.0? The adapter MUST raise per the version-mismatch rule, OR the adapter SHOULD warn-and-attempt (lenient). Lean: MUST raise (strict version-pin matches OA's general "no silent fallback" disposition). - Whether the v1 ratification enumerates EVERY current directive or a subset. Some
directives appear in only one fixture (e.g.,
update_from_fieldwithmultiplier— appears in fan-out fixtures specifically). Enumerating every such directive bloats the spec. Lean: enumerate every directive currently in use; treat the comprehensive enumeration as the load-bearing value of the capability (the whole point is to make the directive vocabulary discoverable and pinned). - Naming alternatives for the capability. "Conformance adapter" was selected over "conformance protocol", "fixture protocol", "conformance fixtures", and bare "conformance" for the reasons in the proposal's pre-PR design discussion. Re-opening if reviewers find a meaningfully better fit.
Out of scope¶
- Per-language adapter implementations. This proposal specifies the contract; concrete Python / TypeScript / future-language adapters are sibling-implementation work and ship separately in their respective implementation repositories.
- Fixture-authoring tooling. Linters that check fixture YAML against the schema; scaffolders that generate fixture stubs from spec sections; visualization tools that render the directive vocabulary as documentation — all useful, all out of scope for v1.
- Performance benchmarking or comparative-conformance reporting. Whether implementation A passes fixture N in 50ms and implementation B passes it in 200ms is not a conformance concern (performance is implementation-specific and not specified at the behavioral layer).
- Redesigning the directive vocabulary. v1 ratifies what exists. A follow-on cleanup proposal MAY consolidate overlapping directives if real demand surfaces, but is not bundled here (per alternative 3).
- Cross-capability test orchestration. Whether the adapter runs fixtures in a specific order, parallelizes across capabilities, or applies tagging / filtering — all implementation choices that adapters MAY surface via their host test runner (pytest markers, vitest tags, etc.), not normative in the spec.