Changelog¶

All notable changes to the OpenArmature specification are documented in this file.

The format is adapted from Keep a Changelog — subsection labels render as bold paragraphs (rather than H3) to keep the rendered docs-site right-rail TOC focused on releases, and there is no [Unreleased] section since the spec tags after every acceptance PR. The spec follows Semantic Versioning.

[0.90.0] — 2026-07-07¶

Changed

retrieval-provider §4 / §6 / §8 — raw widened to verbatim JSON of any top-level shape. EmbeddingResponse.raw and RerankResponse.raw go from dict[str, Any] to dict[str, Any] | list[Any] (TS Record<string, unknown> | unknown[]): raw is now the verbatim deserialized JSON of the successful provider response — an object or an array, matching the response's top-level shape — and a mapping MUST NOT wrap or reshape it. This unblocks bare-array wire (TEI /embed returns a list of vector arrays, /rerank a list of result objects), which the dict-only type could not carry without inventing a synthetic wrapper key. §8's Batch chunking rule also gains a raw stitch clause and §8.1 a raw note pinning the multi-request case: a chunk-and-stitch call's raw is the list of the per-request verbatim responses, in order — nothing the provider returned across the chunk requests is lost, while the normalized fields (response_id, usage, vectors / results) stay ergonomic summaries. Scoped to retrieval-provider; llm-provider Response.raw stays object-shaped (chat/completion responses are always objects). (proposal 0096)

Notes

MINOR (pre-1.0). A public-type widening on two response fields plus a new raw stitch rule in §8 (previously-undefined multi-request behavior). Additive for object-shaped single-request mappings (the union still admits dict); it unblocks the array-response mappings and closes the chunked-raw case. Conformance: raw assertions added to the TEI single-request (017 / 014) and chunk-and-stitch (015 / 038) fixtures and the Cohere embed chunk-and-stitch (037) — no new fixtures.

[0.89.0] — 2026-07-02¶

Added

graph-engine §2 — declared same-name subgraph projection boundary + reducer round-trip warning. A subgraph-as-node MAY now declare its parent↔subgraph boundary as two field-name sets (an in-set and an out-set, naming the fields that cross by the same name on both sides), compile-validated against both schemas — the checked middle ground between the implicit field-name-matching default and the explicit inputs/outputs rename maps. It is a complete declaration with no field-name-matching fallback (empty set = nothing, symmetrically for both directions) and is mutually exclusive with the explicit maps (new compile-error category conflicting_projection_forms; the existing mapping_references_undeclared_field extends to the declared sets). Separately, a new compile-time warning projection_reducer_round_trip flags a field projected in and back out through a non-round-trip-idempotent reducer (the append-doubling hazard) — MUST for the canonical non-idempotent reducers (append / concat_flatten / bounded_append / merge_all; the other four — last_write_wins / merge / merge_by_key / dedupe_append — are round-trip-idempotent), SHOULD for custom reducers. conformance-adapter §5.8 gains an expected_compile_warning: <category> directive to assert it (and formally documents the established expected_compile_error alongside it); pipeline-utilities §9.3 (fan-out extra_outputs) and §11.4 (parallel-branches subgraph-branch outputs) gain pointers to the warning, since those projections merge through the parent's reducer too. (proposal 0094)

Notes

MINOR (pre-1.0). Additive across graph-engine + conformance-adapter + pipeline-utilities: a new opt-in projection form, one new compile-error category (reachable only by the new form), an advisory compile-time warning, and a new conformance-adapter expected-outcome directive. The field-name-matching default and the explicit inputs/outputs maps are unchanged, so existing graphs and fixtures are unaffected. New conformance fixtures for the declared boundary (happy path, empty-out-set-projects-nothing, and the drift + conflicting-forms compile errors) and the round-trip warning (a graph-engine set of cases plus a fan-out instance).

[0.88.0] — 2026-07-01¶

Changed

retrieval-provider §4 / §6 — nullable provider usage records. EmbeddingResponse.usage and RerankResponse.usage are now record | null — a usage record when the provider reports one, null when it doesn't — giving OA one uniform "the provider reported no token usage" model across both response types, the typed events (graph-engine §6 EmbeddingEvent / RerankEvent, already record-null), and the §11 metric (already record-null). §4's EmbeddingUsage.input_tokens drops the "always reported" phrasing (it is present exactly when the record is); §6's "a RerankUsage with both fields null is valid" note is reconciled — the no-usage case is now usage = null, and a record is present only when at least one figure is reported (the record's input_tokens / search_units stay individually nullable for the partial case, e.g. Cohere's search_units without a token count). §2 concept lines qualify usage "(when present)"; §8.1 pins TEI /embed + /rerank to usage = null (the mapping MUST NOT fabricate a record); §8's batch-chunking step 4 combines usage record-aware (sum when reported, else null). Observability tracks the response contract: §5.5.8 (OTel embedding) gen_ai.usage.input_tokens and §8.4.5 (Langfuse embedding) usageDetails.input become conditionally emitted — omitted when the provider reports no usage; §5.5.13 / §8.4.7 rerank guards rephrased record-aware (and §5.5.13's stale "the embedding span, where input_tokens is always present" parenthetical corrected — both spans now emit conditionally). No change to graph-engine §6 or observability §11 (already null-usage-aware). (proposal 0093)

Notes

MINOR (pre-1.0). Widens two public response usage fields to nullable and makes the embedding usage observability emission conditional — a public-type + conformance change. Fixes a live contradiction: the prior "always reported" contract was unsatisfiable for TEI /embed (a bare vector array with no usage object) and forced a fabricated empty RerankUsage for TEI /rerank. Additive for the hosted mappings (they report usage and are unchanged). Conformance: TEI /embed + /rerank fixtures assert usage: null (the fabricated empty rerank records removed), plus new OTel + Langfuse no-usage fixtures for both the embedding and rerank spans/observations and an embedding no-usage metric fixture (pinning §11's zero-token-observation branch, now reachable for embedding).

[0.87.0] — 2026-06-30¶

Added

retrieval-provider §8 — embedding-mapping batch chunking (general rule). A new §8 Batch chunking rule: when an embedding mapping's provider enforces a maximum input count per request and a caller's input exceeds it, the mapping MUST split into consecutive ≤cap chunks, issue one request per chunk with identical per-call parameters, concatenate the per-chunk vectors in input order (preserving §4's one-vector-per-input + input-order invariants), and sum EmbeddingUsage.input_tokens (response_id = the first chunk's id); a provider with no cap (server-side batching) does not chunk. This generalizes — across every embedding mapping — the per-item-independence chunk-and-stitch §8.1 already specs for TEI rerank. The per-mapping caps are recorded in docs/compatibility.md (TEI max-client-batch-size default 32, OpenAI 2048, Cohere 96; Jina cap-free). §8.4 Cohere's 0091 chunk-and-stitch paragraph is reduced to defer to the general rule, and §8.1 / §8.2 / §8.3 gain per-mapping cap notes. Resolves the cross-mapping open question logged at 0091. (proposal 0092)

Notes

MINOR (pre-1.0). Additive — defines the previously-undefined over-cap behavior in §8.1 / §8.3 and generalizes 0091's Cohere instance; no protocol surface change. New conformance fixture for the TEI /embed over-cap chunk-and-stitch (the rule exercised on a second mapping).

[0.86.0] — 2026-06-30¶

Added

retrieval-provider §8.4 — Cohere embeddings wire mapping (/v2/embed). Extends §8.4 Cohere to cover both Cohere endpoints (the §8.2 Jina pattern), adding the embedding half: POST /v2/embed maps texts ← input strings and consumes the type-keyed embeddings.float response in input order; meta.billed_units.input_tokens → EmbeddingUsage.input_tokens, top-level id → response_id, EmbeddingResponse.model the bound id. It's the third realization of the cross-vendor input_type knob and the first where the wire field is mandatory: query / document → search_query / search_document, and an absent input_type MUST map to search_document (Cohere v2 requires the field; bulk-indexing default); an unrecognized value is a pre-send provider_invalid_request. EmbeddingRuntimeConfig.dimensions → Cohere's output_dimension (embed-v4+); truncate: "NONE" is fail-loud (unlike the §8.4 rerank half); the 96-input per-call cap is handled by mandatory client-side chunk-and-stitch (the §8.1 argument). The accept reconciles 0090's "rerank-only / separate future mapping" framing in §8.4 and the §11 deferred list (only Voyage AI remains). (proposal 0091)

Notes

MINOR (pre-1.0). Additive — extends the existing §8.4 with a new endpoint; no protocol surface change. New conformance fixtures for the /v2/embed round-trip, the input_type mandatory-default mapping, unrecognized-input_type rejection, output_dimension passthrough, truncate: "NONE" fail-loud, and the >96-input chunk-and-stitch.

[0.85.0] — 2026-06-29¶

Added

retrieval-provider §8.4 — Cohere rerank wire mapping. The first rerank-only §8 wire mapping, and the one that backs 0060's Cohere-shaped reference reranker: POST /v2/rerank (gen_ai.system: "cohere"). documents is the string-array form, top_n ← top_k; the response {id, results: [{index, relevance_score}], meta.billed_units.search_units} maps onto §6 — search_units → RerankUsage.search_units (the inverse of Jina's token metering; input_tokens stays null), top-level id → response_id. return_documents is a silent no-op (Cohere v2 has no such field and never echoes documents, so ScoredDocument.document stays null — the §8.3 input_type-no-op precedent), and there is no fail-loud truncation (Cohere truncates server-side to max_tokens_per_doc, which rides the extras-pass-through bag). The accept also reconciles §11 Out of scope — dropping Cohere rerank from the deferred list and correcting the stale already-shipped Jina / OpenAI entries. (proposal 0090)

Notes

MINOR (pre-1.0). Additive — a new §8 wire mapping appended after §8.3; no protocol surface change. New conformance fixtures for the Cohere /v2/rerank round-trip, the return_documents no-op, top_k → top_n, and the 429 → provider_rate_limit mapping.

[0.84.0] — 2026-06-28¶

Added

Embedding / rerank typed-event output (graph-engine §6 + observability). EmbeddingEvent gains output_vectors and RerankEvent gains output_results — the output-payload counterparts to their input fields, paralleling LlmCompletionEvent.output_content; populated on the success event, privacy-gated at the rendering boundary. The observability output mappings re-source from these fields — Langfuse embedding.output (§8.4.5) / retriever.output (§8.4.7) and the OTel rerank openarmature.rerank.results attribute (§5.5.13) — making the existing fixtures 083 / 108 satisfiable (an observer's only input is the typed event, which previously carried only the output count). §8.4.5 / §8.4.7 also gain Failure observations paragraphs (ERROR-level rendering of EmbeddingFailedEvent / RerankFailedEvent, mirroring §8.4.6's tool failure). (proposal 0089)

Notes

MINOR (pre-1.0). Additive typed-event fields + observability-mapping reconciliations + a failure-rendering specification; the embedding OTel span stays output-less (the rerank OTel attribute is re-sourced, not added). The disable_llm_spans scoping (§5.5.8 / §5.5.13) is unchanged. New conformance fixtures for the embedding / rerank failure observations.

[0.83.0] — 2026-06-28¶

Added

observability §8 — Langfuse parallel-branches mapping parity. The Langfuse mapping reaches parity with the OTel side (§4.3 / §5.7 / §6) and the fan-out Langfuse mapping for parallel branches: new §8.4.8 Parallel-branches dispatch-span mapping (the Langfuse observer synthesizes the per-branch dispatch Span observation — the three-level tree, with observation.name = branch_name, resolving §5.7's dangling forward-reference); §8.3 gains observation-type rows for the parallel-branches node span + per-branch dispatch span; §8.4.2 gains the parallel_branches_branch_count / _error_policy / _parent_node_name attribute rows (the §5.7 attributes, flattened like fan_out_*); §3.4's reserved caller-metadata-key set gains those three keys (26 → 29). (proposal 0088)

Notes

MINOR (pre-1.0). Additive — brings the spec into line with already-conformant Langfuse behavior; the one behavior touch is the §3.4 reservation (a caller passing one of the three new parallel_branches_* keys as invocation metadata is now rejected at the invoke() boundary). New observability conformance fixture 136 (the Langfuse parallel-branches dispatch-span tree).

[0.82.0] — 2026-06-27¶

Added

conformance-adapter §8.3 — within-node directive execution order. A node's sibling directives (the keys under nodes.<node_name>:) execute in fixture-document order (mapping insertion order, not sorted-by-key), so order-sensitive compositions — e.g. augment_metadata then capture_invocation_metadata_into (observability §3.4) — produce a deterministic result. §7 Nondeterminism handling gains a counterpoint note (within-node order is deterministic, unlike the cross-source interleaving cases it lists); §8.2 Parsing notes lossless parsing preserves directive order. Ratifies the rule fixtures 043/045 already depended on but that the spec never stated. (proposal 0087)

Notes

MINOR (pre-1.0). A new normative conformance-adapter rule; no adapter already passing 043/045 changes. New observability conformance fixture 135 pins the rule directly (the same two order-sensitive directives in opposite document order, with diverging captured snapshots).

[0.81.0] — 2026-06-27¶

Added

graph-engine §6 + observability — nested-fan-out span lineage chain. The observer event surface gains fan_out_index_chain / branch_name_chain on NodeEvent and the provider/tool events (LlmCompletionEvent, LlmFailedEvent, LlmTokenEvent, the embedding / rerank / tool events) — the enclosing fan-out instance / parallel-branch lineage (outermost→innermost, aligned to namespace), with the existing scalar fan_out_index / branch_name retained as the innermost values. The OTel driving-span key (observability §4.1 / §4.3 / §6) now keys by the chains rather than the innermost scalar, so the inner spans of two concurrent fan-out instances nested in an outer fan-out no longer collide and drop. §5.5 gains a Lineage-resolved parent clause (shared by the embedding §5.5.8 / tool §5.5.11 / rerank §5.5.13 spans): a provider span exact-matches its lineage-disambiguated calling-node span, and — when that span is not open (a wrapper / middleware-issued call) — parents under the nearest enclosing wrapper per §4.3, resolved via the chain to the correct inner instance (not the top-level one, not a coincidentally-indexed sibling). §8.4.3 / §8.4.6 confirm the Langfuse observation parent follows the same resolution. (proposal 0084)

Notes

MINOR (pre-1.0). Additive: the scalars and the common single-level case are unchanged (a top-level event carries empty chains, keying identically to before). conformance-adapter §5.1 gains a wrapper-issued-LLM-call primitive for the orphan-fallback fixture. New graph-engine + observability conformance fixtures (OTel + Langfuse) pin the chain on the event surface, the no-dropped-spans keying, nested-LLM exact-match, and the orphan fallback inside a nested instance.

[0.80.0] — 2026-06-26¶

Added

pipeline-utilities §10.11 — nested-fan-out checkpoint resume. A fan_out_progress entry gains an optional enclosing_fan_out_lineage (the outermost→innermost chain of enclosing fan-out instances), so a fan-out nested inside an outer fan-out instance round-trips its per-outer-instance progress across a resume — entries are now keyed by (namespace, fan_out_node_name, enclosing_fan_out_lineage), and §10.11.1's exactly-once guarantee extends to nested fan-outs. A new No mis-skip across enclosing instances invariant requires the engine to re-run (treat as not_started) rather than apply a saved entry's completed skips that don't positively match the re-entering lineage — including a legacy record with no lineage for a nested fan-out — closing a latent cross-implementation hole where an unaware impl could skip inner instances completed by a different outer instance and roll the wrong results forward. (proposal 0085)

Notes

MINOR (pre-1.0). Additive and backward-compatible: a flat (non-nested) fan-out carries an empty lineage and resumes exactly as before. The count-drift check (§10.11) re-resolves per lineage-qualified entry; §10.2's per-fan-out mapping framing, the §10.11 namespace-uniqueness claim, and the §10.7 skip decision are reconciled to the same-node multiplicity. New pipeline-utilities conformance fixture 076. No graph-engine §6 event-shape change — the enclosing lineage is sourced from the engine's save-time context.

[0.79.0] — 2026-06-26¶

Added

prompt-management §6 — service-wide default cache_ttl_seconds on PromptManager. A PromptManager may be constructed with an optional default_cache_ttl_seconds, applied to any fetch / get that does not supply a per-call cache_ttl_seconds. Cache-control now resolves by an explicit precedence chain — per-call value > manager default > backend implementation-defined — mirroring the existing label-resolution chain. A per-call value (including 0 force-fresh) always overrides the default; a manager constructed without a default is unaffected. This is the standing-config counterpart to the per-fetch lever proposal 0072 shipped, and the service-wide default 0072 explicitly deferred. (proposal 0086)

Notes

MINOR (pre-1.0). Additive — a new optional construction parameter; an absent default preserves current behavior exactly (the backend's own caching governs). §5's backend-caching paragraph notes a backend receives a single resolved cache_ttl_seconds per fetch regardless of source; §15's Cache invalidation policies bullet now covers both the per-fetch lever and the standing default. conformance-adapter §6.8 gains a manager: {default_cache_ttl_seconds} construction slot and a target: {manager: true} fetch; new prompt-management conformance fixture 036.

[0.78.0] — 2026-06-25¶

Added

prompt-management §3 / graph-engine §6 / observability — per-prompt token-budget observability. A Prompt gains an optional token_budget ({input_max_tokens?, total_max_tokens?} — the input and total token ceilings; the output budget stays sampling.max_tokens), advisory and observability-only — it never touches the LLM request. It rides onto the LlmCompletionEvent / LlmFailedEvent token_budget field and, reactively against the provider's actual reported usage, drives a budget-exceeded signal: the openarmature.llm.token_budget.exceeded span attribute (§5.5.15), a SHOULD-level WARNING log (§7) and Langfuse observation.level = "WARNING" (§8.4.3), and two opt-in §11 metrics — an openarmature.gen_ai.client.token_budget.exceeded counter and a .utilization histogram (dimensioned by a new openarmature.gen_ai.token_budget.kind = input / total, §11.3). Evaluated on every completion and on the usage-bearing structured_output_invalid failure (proposal 0082). (proposal 0083)

Notes

MINOR (pre-1.0). Purely additive across prompt-management (§3 / §4 / §5 / §12), graph-engine (§6), observability (§5.5.15 / §7 / §8.4.3 / §11.2–§11.5), and conformance-adapter (§5.8 / §6.9 — extended for the two new instruments, the kind dimension, and the deterministic utilization-ratio asserted value). No behavior change for any prompt without a token_budget or any observer with enable_metrics off; the budget never affects the request. New observability conformance fixtures 126–131. Sequenced after prerequisite proposal 0082 — the structured_output_invalid failure-path coverage builds on 0082's LlmFailedEvent.usage and §11.2 reconciliation.

[0.77.0] — 2026-06-25¶

Added

graph-engine §6 — LlmFailedEvent response-side surface for structured-output failures. A structured_output_invalid failure is the one llm-provider §7 category where the model did return a response (content that failed downstream parse/validation), so LlmFailedEvent now carries five of LlmCompletionEvent's response-side fields (all but output_tool_calls, which a structured-output failure never has) — output_content, finish_reason, usage, response_id, response_model — populated for that category and null for every other (no response received). Observers can now triage a truncation (finish_reason == "length" — the model hit max_tokens) from a model that finished but emitted malformed or schema-violating JSON, and the failed generation renders with its real output, token usage, and stop reason instead of a null, zero-token record. (proposal 0082)

Changed

llm-provider §7 — the structured_output_invalid error additionally exposes the response's normalized finish_reason and token usage (both available from the received response, since the failure is a downstream parse/validation step on an intact wire response). This makes the §7-level retry note actionable per-failure and reconciles §8.2.5's existing "surfaces the mapped finish_reason" statement with §7's error contract; the non-transient-by-default classification is unchanged. (proposal 0082)
observability — the response-side surface renders on both bundled backends. §5.5.7 reconciled (response-side fields are absent for the §7 categories with no response, with the structured_output_invalid carve-out); §8.4.3 — the bundled Langfuse failed Generation populates output / usage / metadata.finish_reason for structured_output_invalid, in addition to its ERROR level and including the call-level-retry terminal-failure path; §5.5.1 / §5.5.3 — the OTel error span carries the same attributes; §11.2 — a structured_output_invalid failure (alone among failures) records the openarmature.gen_ai.client.token.usage metric, since it carries a usage record. (proposal 0082)

Notes

MINOR (pre-1.0). Purely additive on the event / error surfaces — observers and callers that do not read the new fields are unaffected. New observability conformance fixtures 120–125 (truncation / schema-mismatch / null-on-non-body-failure + Langfuse and OTel rendering + the token-usage metric); llm-provider fixtures 022 / 023 updated to assert the now-required finish_reason + usage. The observability suite gains a calls_llm.response_schema harness directive (documented per conformance-adapter §3.2) so the structured-output cases drive the real failure path. Also tightened a §5.5.7 cross-reference (the LlmFailedEvent framing cited §8.7 Generation rendering for the Langfuse generation error; the error-level mapping is §8.4.2).

[0.76.1] — 2026-06-23¶

Fixed

observability conformance fixtures 069 / 070 / 073 — corrected to match the contracts they test. Wiring these LlmFailedEvent fixtures (proposal 0058) into a reference adapter surfaced three that a conformant implementation couldn't drive as written: 069 asserted a request model it never declared (now declares model: gpt-test, like 068); 070 used an absent tool_call_id to drive a boundary provider_invalid_request, but a typed implementation enforces a required field's presence at construction — reshaped to a present-but-unmatched tool_call_id, which is constructible everywhere and raises at the complete() boundary; 073 asserted the vendor body's error.type verbatim, stricter than its own "either style satisfies" contract — relaxed to the permissive contract (error_type is a non-empty string, or null). (proposal 0058)
llm-provider §3 / §7 — validation-timing clarification. Split the message-shape rule's enforcement by layer: a single-message constraint (e.g. a per-role required field's presence) MAY be enforced at message construction in implementations whose message types make it required, while constraints that span the message list (e.g. a tool message's tool_call_id matching an earlier assistant ToolCall) are enforced at the complete() boundary and raise provider_invalid_request. §7's provider_invalid_request definition gains the matching caveat so the two sections agree. No behavior change — conformant implementations already do this; the text was imprecise in implying all message-shape validation happens at the boundary.

Notes

PATCH (pre-1.0). Conformance-fixture corrections (no contract change — the fixtures were stricter than, or mis-targeted, the behaviors they test) plus a validation-timing clarification documenting existing conformant behavior. No fixture additions; no emitted-key or protocol change.

[0.76.0] — 2026-06-23¶

Added

conformance-adapter §5.10 — Value matchers. Promotes the fixture value-matcher vocabulary into a normative §5 enumeration: inline value-tokens (<uuid>, <any-string> = non-empty, <trace_id_X> = first-occurrence binding), assertion sub-keys (non_empty_string, harness_parameterized), and the exact-value + named-derivation-invariant idiom. Previously the inline tokens lived only in a §3.2 worked example and the sub-keys in fixture prose; §5.10 gives every adapter one defined set to implement uniformly, and §3.2's placeholder list now cites §5.10 as the normative home. (proposal 0081)

Notes

MINOR bump (pre-1.0). Extends the §5 directive vocabulary (per the capability's §1 framing). Largely descriptive of matchers already in use; the one substantive tightening is that <any-string> = non-empty is now normative (an adapter accepting the empty string becomes non-conforming). No fixture changes. (proposal 0081)

[0.75.0] — 2026-06-23¶

Added

prompt-management §10 / §11 — PromptGroup arity enforcement. §10's two-or-more-members rule gains teeth: constructing a PromptGroup with fewer than two members (empty or single-member) MUST now raise at construction time, and §11 adds a prompt_group_invalid error category (a general group-construction-validity bucket modeled on prompt_render_error — arity is its first trigger). Closes the gap the fixture-066 correction (v0.74.1) exposed: a normative MUST with no enforcement point, no error category, and no conformance fixture pinning it. (proposal 0080)
New prompt-management conformance fixture 035 (single-member + empty group construction each raise prompt_group_invalid), complementing 011 (valid N>2). (proposal 0080)

Notes

MINOR bump (pre-1.0). Additive: a new error category + the pinned enforcement of an already-stated MUST + a new fixture. The valid-group contract is unchanged — no conforming construction starts failing. (proposal 0080)

[0.74.1] — 2026-06-23¶

Fixed

observability conformance fixture 066 — PromptGroup member-count correction. Fixture 066 (LlmCompletionEvent.active_prompt_group population) constructed a single-member group, which prompt-management §10 marks spec-invalid (members MUST contain at least two elements). Corrected to a valid two-member group (classify + summarize, with classify active); the renders_prompt_group conformance directive gains an explicit members: [<name>, …] sub-key (the ordered ≥2 member list) so group membership is declared rather than inferred from the backend's prompt set. No behavior change — the asserted active_prompt / active_prompt_group records are unchanged; the fix reconciles the fixture with already-accepted §10. (proposal 0057)

Notes

PATCH (pre-1.0). A conformance-fixture correction reconciling fixture 066 with prompt-management §10's two-or-more-members rule; no spec-text or behavior change, and reference implementations already enforce the ≥2 constructor. (proposal 0057)

[0.74.0] — 2026-06-22¶

Added

retrieval-provider §8.3 OpenAI-compatible embeddings. The third wire mapping (the ecosystem anchor): base_url-configurable POST /v1/embeddings covering OpenAI plus the OpenAI-compatible serving ecosystem (vLLM, LocalAI, Together, TEI's own OpenAI endpoint, …) — the retrieval analogue of llm-provider §8.1. Symmetric: the OpenAI /v1/embeddings wire has no query/document parameter, so 0077's input_type is not realized on it (absent ⇒ symmetric; for an asymmetric model behind a compatible endpoint, the optional §8.1 client-side query_prefix / document_prefix applies). Embeddings-only (OpenAI has no rerank API). usage.prompt_tokens → EmbeddingUsage.input_tokens; Bearer auth; gen_ai.system="openai". (proposal 0079)
Five retrieval-provider conformance fixtures (023–027): the /v1/embeddings wire round-trip; base_url override; dimensions passthrough; input_type symmetric no-op (+ the request_params flow); and the client-side-prefix fallback for an asymmetric model behind a compatible endpoint. (proposal 0079)

Notes

MINOR bump (pre-1.0). Additive: §8.3 is a new wire mapping; no protocol-surface change (the input_type knob and §8 are 0077's), no renumber. Completes the retrieval wire-mapping batch — TEI §8.1 (self-hosted) / Jina §8.2 (hosted) / OpenAI-compatible §8.3 (ecosystem). (proposal 0079)

[0.73.1] — 2026-06-22¶

Changed

graph-engine §6 / observability §5.7 — branch_count reconciled under a when-skip. Clarified that parallel_branches_config.branch_count is the number of branches dispatched — equal to len(branch_names) only when no branch is when-skipped (proposal 0075); under a skip it is the dispatched subset, while branch_names stays the full declared set. Resolves the §5.7-vs-§6 ambiguity that fixture 110 flagged. (proposal 0044, proposal 0075)

Added

Conformance coverage round-out: observability fixture 110 now asserts branch_count = 2 (the dispatched count under a when-skip); fixture 008 gains case detached_fan_out_instance_raises_error_status_on_both_spans (the §4.2 detached fan-out-instance error path, previously covered only for the subgraph case — proposal 0061); new fixture 119 pins the callable-branch dispatch-span attempt_index under node-level retry (proposal 0075).

Notes

PATCH (pre-1.0). A clarifying reconciliation of contradictory branch_count text plus conformance coverage round-out; no behavior change (the dispatched reading was already the §5.7 wording, and reference implementations conform).

[0.73.0] — 2026-06-22¶

Added

retrieval-provider §8.2 Jina. The second wire mapping — Jina AI's hosted /v1/rerank and /v1/embeddings. Rerank maps near-1:1 onto §5 / §6 (documents, top_k → top_n, return_documents, results → ScoredDocument, usage.total_tokens → RerankUsage.input_tokens); /v1/embeddings realizes the §2 input_type knob via Jina's native task ("query" → "retrieval.query", "document" → "retrieval.passage"); Bearer API-key auth, base_url defaulting to the hosted endpoint (origin; /v1 in the route). (proposal 0078)
Five retrieval-provider conformance fixtures (018–022): the Jina /v1/rerank wire round-trip; return_documents default-override; input_type → task; truncation fail-loud; and 429 → provider_rate_limit. (proposal 0078)

Fixed

retrieval-provider §2 — return_documents vendor-default correction. The §2 Rerank runtime config note wrongly stated Jina AI's wire return_documents defaults False; it defaults True (Cohere and Voyage default False). OA's RerankRuntimeConfig.return_documents still defaults False; the §8.2 Jina mapping sends the value explicitly so the OA default is honored. (proposal 0078)

Notes

MINOR bump (pre-1.0). Additive: §8.2 is a new wire mapping; no protocol-surface change (the input_type knob and §8 are 0077's), no renumber. (proposal 0078)

[0.72.0] — 2026-06-22¶

Added

retrieval-provider §2 / §3 — input_type embedding knob. EmbeddingRuntimeConfig gains an optional input_type field ("query" / "document", an extensible string) so a provider bound to an asymmetric retrieval model (BGE / E5 / GTE, and hosted vendors via their own wire parameters) applies the model-appropriate query vs. passage treatment; absent ⇒ symmetric (the prior behavior, unchanged). It flows into EmbeddingEvent.request_params (graph-engine §6, alongside dimensions) and is surfaced as the openarmature.embedding.input_type span attribute (observability §5.5.8). (proposal 0077)
retrieval-provider §8 Wire-format mappings + §8.1 TEI. A new top-level section (the retrieval analogue of llm-provider §8) cataloging per-vendor / per-runtime wire mappings, opening with §8.1 TEI (HuggingFace Text Embeddings Inference, self-hosted): construction (base_url + model + input_type → prompt_name map + optional client-side prefixes + chunk_size), the /embed and /rerank wire shapes, input_type realized via TEI's server-side prompt_name, the mandatory rerank chunk-and-stitch (split pools over max-client-batch-size, re-base indices to absolute positions, global re-sort, honor top_k), and truncate: false fail-loud. A TEI EmbeddingProvider and RerankProvider are distinct instances against distinct deployments (TEI hosts one model per instance). (proposal 0077)
Five retrieval-provider conformance fixtures (013–017): input_type → prompt_name realization; single-batch /rerank; the load-bearing chunk-and-stitch (3 requests, absolute-index re-basing, global sort, top_k); truncate fail-loud; and the /embed wire round-trip. (proposal 0077)

Changed

retrieval-provider section renumber. Inserting §8 Wire-format mappings shifts the former §8 Determinism → §9, §9 Cross-spec touchpoints → §10, §10 Out of scope → §11; the §11 out-of-scope wire-mapping deferral drops its now-landed TEI entry. (proposal 0077)

Notes

MINOR bump (pre-1.0). Additive: input_type is optional (absent ⇒ the prior symmetric behavior, byte-identical), §8 / §8.1 are new, and the renumber is internal (no external cross-references). (proposal 0077)

[0.71.0] — 2026-06-20¶

Added

llm-provider §5 / §6 / §8.1.6 — LLM completion streaming. complete() gains an optional stream flag (default off; return type unchanged — still Response). When set, the provider consumes the model's streaming wire response and emits a per-chunk LlmTokenEvent (graph-engine §6) as each chunk arrives; the §6 Streaming assembly contract reassembles the atomic Response (content concatenation, reasoning-block assembly, tool-call-delta reassembly, terminal usage / finish_reason) so the streamed and non-streamed paths are structurally identical. A Provider streaming support rule requires mappings without streaming to reject stream-set calls with provider_invalid_request; §8.1.6 adds the OpenAI-compatible SSE wire handling (stream_options.include_usage, [DONE], content / tool-call deltas, and the reasoning-delta extension recognizing both reasoning_content and reasoning). (proposal 0062)
graph-engine §6 — LlmTokenEvent. A new unpaired within-call typed observer event (no LlmTokenFailedEvent) carrying one streamed delta per chunk: delta_kind ("content" / "reasoning"; "tool_call" reserved), delta, monotonic chunk_index, and the identity / scoping baseline, correlated to the terminal LlmCompletionEvent / LlmFailedEvent by shared call_id. Fires only on a stream-set call; dispatched in chunk_index order before the terminal event. (proposal 0062)
observability §5.5.7 / §8.4.3 — token events not rendered. Notes that the bundled OTel and Langfuse observers do NOT render LlmTokenEvent: no per-token spans / observations; trace recording stays atomic at the terminal LlmCompletionEvent. LlmTokenEvent is for custom forwarding observers (the live-UI / live-"thinking" case). (proposal 0062)
Eight new observability conformance fixtures (111–118) and two new llm-provider fixtures (059–060): token-event dispatch / absence / no-token-events-for-tool-calls / failure-mid-stream / call-id linkage / call-level-retry / reasoning delta_kind (both reasoning_content and reasoning); bundled-observer atomicity under stream; the OpenAI-compatible streaming wire path; and the streaming-unsupported-mapping rejection. (proposal 0062)

Changed

llm-provider §10 — streaming deferral lifted. The blanket "Streaming responses" out-of-scope item is removed (now in scope), replaced by narrower deferrals: node-body iterator consumption, tool-call-delta token events, Anthropic / Gemini streaming wire (those mappings reject stream-set calls until their follow-ons), and non-completion streaming. (proposal 0062)

Notes

MINOR bump (pre-1.0). Additive: stream is opt-in (default off; the atomic path is byte-for-byte the prior behavior), LlmTokenEvent is a new opt-in observer-union variant, and the bundled trace mappings are unchanged (token events are for custom observers). (proposal 0062)

[0.70.1] — 2026-06-20¶

Added

observability conformance fixture 110 — pins the §5.7 callable-branch span shape that proposal 0075 specified but left unfixtured: an inline-callable parallel branch renders as a per-branch dispatch span keyed by openarmature.node.branch_name with no inner-node spans, and a when-skipped branch emits no span. No spec-text change (the 0075 behavior is already normative). (proposal 0075)

Notes

PATCH (pre-1.0). Conformance coverage only — adds one fixture; no spec-text or behavior change.

[0.70.0] — 2026-06-20¶

Added

retrieval-provider §5 / §6 — rerank protocol. Adds RerankProvider as the second protocol surface on the retrieval-provider capability (sibling to EmbeddingProvider): ready() + rerank(query, documents, *, top_k=None, config=None) returning a RerankResponse of ScoredDocument entries sorted by relevance_score descending, each carrying its input-documents index (load-bearing for caller-side lookup), with RerankUsage (optional search_units / input_tokens) reflecting the varied rerank billing landscape. Same per-model-binding + error-category + privacy posture as embedding. (proposal 0060)
graph-engine §6 — typed rerank events. Two paired typed observer events RerankEvent (success) + RerankFailedEvent (failure), the rerank sibling to the embedding pair, carrying the identity / scoping / request-side field set plus rerank success-side fields (response_id, response_model, usage, document_count, top_k, result_count) and the three failure-specific fields (error_category, error_type, error_message). Mutual exclusion + exception-flow + dispatch timing mirror the embedding pair; query / documents and the result document echoes are payload-bearing, gated observer-side by disable_provider_payload. (proposal 0060)
observability §5.5.13 / §5.5.14 / §8.4.7 — rerank mapping. OTel rerank span openarmature.rerank.complete carrying the core GenAI semconv subset (with gen_ai.usage.input_tokens conditionally emitted, since rerank providers vary on reporting it) plus OA-namespace openarmature.rerank.* attributes (including the conditionally-emitted search_units); the Typed rerank events note (§5.5.14); and the Langfuse dedicated Retriever observation asType: "retriever" (§8.4.7), with the OA usageDetails.searchUnits convention. gen_ai.operation.name deferred — no upstream rerank coverage. disable_provider_payload (proposal 0059) already gates the rerank payload. (proposal 0060)
observability §11 — rerank metrics. Rerank joins 0067's operation-generic GenAI metric instruments: the openarmature.gen_ai.operation dimension gains rerank, the duration histogram + error.type source RerankFailedEvent, and the token-usage histogram records rerank input_tokens as input when reported (rerank has no output tokens; search_units is a billing unit, not a token). Completes the rerank hook 0067 left in §11.2 / §11.3. (proposal 0060)
Seven new retrieval-provider conformance fixtures (006–012): rerank positive control, model-binding error, malformed-response (out-of-range / duplicate index), top_k contract (respected / violation), and per-result echo variance. Eleven new observability conformance fixtures (099–109): rerank event dispatch (success / failure), mutual exclusion, call-id distinctness, query/documents + request-params + top_k/result-count + active-prompt population, OTel span attributes (both conditional branches), the Langfuse Retriever observation, and rerank metrics (token + duration, operation rerank). (proposal 0060)

Changed

retrieval-provider §-renumber. The rerank protocol inserts as §5 / §6; the existing §5 Error semantics, §6 Determinism, §7 Cross-spec touchpoints, §8 Out of scope renumber to §7–§10. Cross-references in graph-engine §6, observability §11, and the embedding conformance fixtures (002–004, 075) are reconciled in the same change. (proposal 0060)

Notes

MINOR bump (pre-1.0). Additive: a new protocol surface on an existing capability + two new typed events + new OTel / Langfuse mappings; the §-renumber is internal to the retrieval-provider spec with all cross-references reconciled. No change to existing behavior. (proposal 0060)

[0.69.0] — 2026-06-19¶

Added

graph-engine §6 — tool-execution observability. Makes a caller's tool execution observable: an opt-in node-body tool-call instrumentation scope (the caller wraps a single tool execution; OA observes — it does not select, run, retry, or loop tools) and two paired typed events ToolCallEvent (success) + ToolCallFailedEvent (failure), carrying the identity / scoping baseline + tool_name / tool_call_id (linking to the requesting LlmCompletionEvent.output_tool_calls entry) / arguments / result (success) and error_type + error_message (failure — no error_category, since arbitrary tool code has no closed llm-provider §7 taxonomy). The event-driven start/complete split carries the scope-entry identity. (proposal 0063)
observability §5.5.11 / §5.5.12 / §8.4.6 — tool-execution mapping. OTel tool span openarmature.tool.call with OA-namespace openarmature.tool.* attributes + the Stable error.type on failure (§5.5.11), the Typed tool events note (§5.5.12), and the Langfuse dedicated Tool observation asType: "tool" (§8.4.6). disable_provider_payload (§5.5.4) extended to gate the tool payload (arguments / result). (proposal 0063)
Seven new observability conformance fixtures (092–098): tool-call event dispatch (success / failure), mutual exclusion, id-linkage, payload gating, OTel span attributes, Langfuse Tool observation. (proposal 0063)

Notes

GenAI gen_ai.tool.* adoption — peripheral, mirrored. The upstream execute_tool span + gen_ai.tool.* attributes are Development (verified 2026-06-19); under the §5.5 GenAI de-facto-standard carve-out (proposal 0073) the tool-execution surface is assessed peripheral (not recognized-core), so OA mirrors it to openarmature.tool.* — a clean prefix swap when it reaches recognized-core / Stable. (proposal 0063)
MINOR bump (pre-1.0). Additive and opt-in: two new typed events + a node-body instrumentation primitive + new OTel / Langfuse mappings; events fire only when the caller instruments a tool execution. No change to existing behavior. (proposal 0063)

[0.68.0] — 2026-06-19¶

Added

observability §11 — Metrics. The OTel metrics signal, complementing the §4–§6 spans and §7 logs: two opt-in OA-namespaced histogram instruments over provider calls — openarmature.gen_ai.client.token.usage ({token}) and openarmature.gen_ai.client.operation.duration (s) — recorded per LLM completion and per embedding call (per attempt under call-level retry), from the §5.5.7 / §5.5.9 typed completion events (and the typed LlmFailedEvent / EmbeddingFailedEvent for an errored attempt's duration + error.type). Opt-in via an enable_metrics observer flag (default off), independent of span emission. The instruments mirror the Development-status upstream gen_ai.client.* (type / unit / bucket advisory) for a mechanical future cutover (per Stable-only upstream adoption). Dimensions follow the §5.5 GenAI de-facto-standard carve-out — recognized-core gen_ai.request.model / gen_ai.system used directly (gen_ai.system retained), peripheral gen_ai.operation.name / gen_ai.token.type mirrored to openarmature.gen_ai.*, Stable error.type used directly. (proposal 0067)
conformance-adapter §6.9 — Metric capture. An in-memory MetricReader harness primitive (sibling to §6.3 OTel collector capture) recording every observation for assertion, gated by enable_metrics; plus a §5.8 metrics: expected-outcome directive. (proposal 0067)
Four new observability conformance fixtures (088–091): LLM token + duration, embedding token, errored-call duration + error.type, and metrics-off. (The call-level-retry recording cadence is specified normatively in §11.2; its fixture lands with the call-level-retry conformance infrastructure.) (proposal 0067)

Changed

observability §11 Out of scope → §12. Renumbered (citation-safe); the blanket "metrics out of scope / trace-only" bullet narrowed to graph-level metrics, with streaming/server metrics and the upstream gen_ai.client.* instrument-name cutover folded in as explicit deferrals. (proposal 0067)

Notes

MINOR bump (pre-1.0). Additive and opt-in (default off): a new metrics signal (observability §11) plus a conformance-adapter harness primitive (§6.9). No change to span / log / Langfuse behavior or any existing attribute. (proposal 0067)

[0.67.0] — 2026-06-19¶

Added

observability §5.5.1 / §5.5.10 — output tool-call attributes. A model's tool-call request is part of its output, but the output payload had no home for it (output.content is the assistant's text, omitted for tool-call-only completions). This adds one: §5.5.1 gains the gated payload attribute openarmature.llm.output.tool_calls, serializing the output tool calls as [{id, name, arguments}] (the output-side counterpart to the input tool-call serialization, §5.5.5); §5.5.10 adds the ungated identity projections openarmature.llm.output.tool_calls.count / .names / .ids (index-aligned arrays in request order; count = length), so which tools were requested stays queryable in the default payload-off posture while the arguments stay in the gated full. OA-namespace with no gen_ai.* mirror — upstream carries output tool calls as tool_call parts inside the structured gen_ai.output.messages (no flat surface), and gen_ai.tool.* is the execute_tool span family, so there is nothing to adopt (the openarmature.llm.attempt_index precedent, proposal 0050). The .ids link a completion's requests to a downstream tool execution. (proposal 0076)
graph-engine §6 — LlmCompletionEvent.output_tool_calls. The typed LLM completion event gains an output_tool_calls field (the assistant message's output tool calls in typed-event-native form, complementary to output_content, which is null for tool-call-only responses) — the source the new §5.5.1 / §5.5.10 span attributes render from. Populated unconditionally; gated at the rendering boundary like the other payload fields. (proposal 0076)
Three new conformance fixtures: observability/conformance/085-llm-tool-call-request-attributes, 086-llm-tool-call-request-absent, and 087-llm-tool-call-request-survives-payload-gating. (proposal 0076)

Changed

observability §5.5.5 — the Tool-call serialization note's "first-class tool-call observability is a separate forthcoming proposal" forecast is retired, fulfilled for the request side by §5.5.10. (proposal 0076)

Notes

MINOR bump (pre-1.0). Additive: a new output_tool_calls field on the existing LlmCompletionEvent (graph-engine §6), one new gated payload attribute (§5.5.1), and three ungated identity attributes (§5.5.10) on the existing LLM span. The new event field and attributes are additive; no change to the LLM completion contract or any existing event field / attribute. (proposal 0076)

[0.66.1] — 2026-06-18¶

Changed

observability §8.3 / §8.4.3 — call-level-retry Langfuse mapping clarified. Under call-level retry (llm-provider §7.1), §5.5 emits N per-attempt OTel spans (openarmature.llm.attempt_index), but the Langfuse mapping renders one terminal Generation per complete() call, not one per attempt — it maps to the logical call's terminal outcome, so the per-attempt detail stays the OTel span surface only. Success → the terminal completion the typed LlmCompletionEvent reports (§5.5.7), carrying the response; retry exhaustion → the terminal failed Generation. Distinct from node-level retry (pipeline-utilities §6.1), which renders one observation per attempt (metadata.attempt_index). The §8.3 "LLM provider span → Generation" row is qualified accordingly.

Notes

PATCH bump. Clarification only — makes explicit an already-implied consequence of the §5.5.7 terminal-event model for the §8 Langfuse mapping under call-level retry; reconciles the §8 mapping (proposal 0031) with §5.5's per-attempt spans (proposal 0050). No behavior change, no new or changed normative contract, no new proposal, and no fixture change (no call-level-retry Langfuse fixture existed to change).

[0.66.0] — 2026-06-18¶

Added

pipeline-utilities §11 — inline-callable parallel branches. A parallel-branches branch spec may now give its work as call — an inline async function over the parent state returning a parent-shaped partial update — instead of a compiled subgraph with its own state schema + inputs / outputs projection (§11.1.1; exactly one of subgraph / call, mixing allowed, parallel_branches_invalid_branch_spec compile error otherwise). A callable branch's contribution is the partial update it returns, merged via the parent reducer with no projection (§11.4). Closes the gap for "M heterogeneous lightweight parallel calls over shared state, each independently failure-isolated" (hybrid recall, paired reads) — previously a hand-rolled concurrent gather — while reusing §11's concurrency, fail-fast cancellation (§11.5), per-branch failure-isolation + events (§11.7), and reducer fan-in (§11.4). (proposal 0075)
pipeline-utilities §11.10 — conditional branches. A branch spec (subgraph or callable) may carry an optional when predicate (parent_state) -> bool, evaluated once at dispatch; false skips the branch entirely (no dispatch, contribution, events, or span). All-branches-skipped is a valid no-op, distinct from the compile-time parallel_branches_no_branches. (proposal 0075)
Three new conformance fixtures: pipeline-utilities/conformance/073-parallel-branches-callable-branches, 074-parallel-branches-conditional-when, and 075-parallel-branches-callable-failure-isolation. (proposal 0075)

Changed

graph-engine §6 / observability §5.7 — callable-branch observability. A callable branch has no inner nodes, so the branch itself is the observer event-source unit: one started / completed pair keyed by branch_name (graph-engine §6), rendered as a per-branch dispatch span under openarmature.node.branch_name (observability §5.7). A when-skipped branch emits no events and no span. No new event variant or span attribute — the existing branch_name surface is reused. (proposal 0075)

Notes

MINOR bump (pre-1.0). Additive: call is an alternative to subgraph (existing subgraph branches unchanged) and when is optional (absent ⇒ always dispatch). No change to existing parallel-branches, fan-out, or middleware behavior. (proposal 0075)

[0.65.0] — 2026-06-18¶

Added

pipeline-utilities §6.3 — catch gate on failure isolation. FailureIsolationMiddleware gains an optional catch field: a set of error categories (llm-provider §7 / graph-engine §4 enum) matched against the caught exception's cause chain via the new §6.4 cause-chain classification primitive. At the §9.7 / §11.7 / §9.6 / §11.6 wrapping placements the engine wraps the failure in one or more node_exception carriers before isolation catches it, so the surface exception is a carrier — a surface category / type check misses the originating failure and re-raises, inverting an intended degrade into a crash. catch classifies through the carriers (matching the derived category — the same value reported as caught_exception.category), closing that footgun; it is the recommended gate for category-scoped degradation, mirroring §6.1's classifier. Additive — catch defaults unset (catch-all preserved); composes with predicate as a conjunction; predicate stays the escape hatch and is now documented as surface-only with the cause-aware alternatives. (proposal 0074)
pipeline-utilities §6.4 — Cause-chain classification primitive. Promotes §6.3's carrier-skipping cause-fidelity walk to a public, named primitive (the ordered cause chain + the derived category) shared by §6.1 retry, §6.3 isolation, and consumers, so a carrier-wrapped failure classifies identically everywhere rather than each site re-deriving the walk. (proposal 0074)
New conformance fixture pipeline-utilities/conformance/072-failure-isolation-catch-cause-chain (two cases: catch matches a carrier-wrapped category and degrades; a non-matching catch set propagates). (proposal 0074)

Changed

pipeline-utilities §6.1 — default retry classifier depth documented as deliberately single-level. The default transient classifier inspects the surface category and its immediate cause one level (not the full chain): retry re-runs, so it classifies at re-attempt granularity, leaving deeply-nested transients to the inner scope that can retry only the failing call. A caller needing outer full-chain retry classification supplies a custom classifier using the §6.4 primitive. No behavior change — documentation of the existing single-level rule and the contrast with §6.3's full-chain degrade classification. (proposal 0074)

Notes

MINOR bump (pre-1.0). Additive: the catch field and the §6.4 primitive are new public surface; catch defaults preserve the current catch-all behavior, and retry behavior is unchanged (the §6.1 change is documentation). (proposal 0074)

[0.64.0] — 2026-06-18¶

Changed

GOVERNANCE — gen_ai.* adoption reconciled with upstream reality. A re-verification against the dedicated semantic-conventions-genai repository (where the OpenTelemetry GenAI conventions now live) found the entire gen_ai.* surface at Development status (none Stable) and gen_ai.system removed upstream in favor of gen_ai.provider.name — contradicting the prior "Stable, adopted directly" framing. Two rules are added to GOVERNANCE.md External-dependency adoption: a narrow, GenAI-scoped de-facto interoperability standard carve-out (OA adopts the recognized core gen_ai.* names directly even at upstream Development, because every GenAI-aware backend keys on them and an openarmature.* mirror would defeat that recognition; peripheral Development attributes are still mirrored), and a post-adoption retention rule (an adopted name is retained through an upstream rename / removal / status change; migration is a deliberate follow-on decision). (proposal 0073)
observability §5.5 — adoption rationale reframed to core-vs-peripheral. A new §5.5 framing note records that the emitted gen_ai.* attributes are adopted under the carve-out (core directly, peripheral mirrored via §5.5.3.1), the deciding line being installed-base recognition rather than the upstream maturity label (the whole GenAI convention is Development). The §5.5.3 gen_ai.system entry notes it is retained despite the upstream gen_ai.system → gen_ai.provider.name removal (migration deferred); the §5.5.3.1 and §5.5.8 "until upstream Stable" wording is reconciled to "Stable or demonstrably ubiquitous." docs/compatibility.md is corrected to record the repository move, the all-Development status, the rename, and the retention. (proposal 0073)

Notes

MINOR bump (pre-1.0). Adds new normative adoption rules to governance and reframes the spec's adoption rationale; no emitted attribute changes and no conformance-expectation changes — existing gen_ai.* fixtures remain valid and serve as the retention regression coverage. Unblocks proposal 0067, whose metric dimensions reuse these keys. (proposal 0073)

[0.63.1] — 2026-06-17¶

Added

pipeline-utilities conformance — two coverage fixtures (no behavior change). Pin two already-normative behaviors that lacked a cross-impl conformance fixture. 070-crash-injection-after-node-resume exercises the crash_injection.after_node crash boundary (proposal 0070, conformance-adapter §5.6 — fixture 067 covered only after_fan_out_instance): the node rolls forward on resume (in completed_positions, not re-run). 071-fan-out-degrade-strict-reducer-raise pins the scope boundary of proposal 0069's degrade-never-raises guarantee — a FailureIsolation-degraded instance's null slot under a strict-element reducer (concat_flatten) raises ReducerError (graph-engine §2), the guarantee being scoped to null-tolerant reducers only (fixture 069 Case 2 is the null-tolerant append counterpart).

Notes

PATCH bump. Conformance coverage only — no behavior change, no new or changed normative text, and no new proposal (the pinned behaviors are already normative per proposals 0069 / 0070 + graph-engine §2). Fills two cross-impl coverage gaps left by fixtures 067 (after_fan_out_instance only) and 069 (null-tolerant append only).

[0.63.0] — 2026-06-17¶

Added

prompt-management §5 / §6 — per-fetch cache_ttl_seconds control. PromptBackend.fetch (§5) and PromptManager.fetch / get (§6) gain an optional cache_ttl_seconds parameter for backends that maintain a client-side template cache: absent / None preserves current behavior; 0 forces a fresh read past any cached entry; N > 0 bounds a served entry's staleness to N seconds; a negative value MUST be rejected. It is a read-side control — it governs only which cached entry MAY be served for this fetch, not whether or how the fetched result is then cached (write, eviction, sizing, and cross-process invalidation stay implementation-defined). Cacheless backends (filesystem, in-memory) treat it as a no-op; the manager threads it verbatim through the §9 fallback chain; render is unchanged (local, no I/O). Turns §5's pre-existing "backends MAY cache … invalidation is implementation-defined" into a defined caller lever for on-demand prompt refresh without a process restart. §15's Cache invalidation policies bullet now distinguishes the (now-controllable) backend-template cache from the still-out-of-scope user-level result cache. (proposal 0072)
conformance-adapter §6.8 — caching prompt-backend harness primitive. An in-memory PromptBackend that caches by (name, label), counts source reads, and honors cache_ttl_seconds (0 bypasses; None serves cached; N > 0 via a controllable clock), exposing a source_read_count assertion shape and an advance_clock operation. (proposal 0072)
Two new conformance fixtures prompt-management/conformance/033-prompt-backend-cache-ttl-force-fresh (default control coalesces a repeat fetch to one source read; cache_ttl_seconds=0 reads the source on every fetch) and 034-prompt-backend-cache-ttl-max-age (an entry served within N seconds, re-read once aged past N, via the controllable clock).

Notes

MINOR bump (pre-1.0). Fully additive and backward-compatible — the parameter defaults to absent / None, which preserves current fetch / render / fallback behavior exactly; no existing caller changes. Touches the prompt-management and conformance-adapter specs only; no other capability, public type, or wire change. (proposal 0072)

[0.62.0] — 2026-06-17¶

Added

observability §8.4.1 — Langfuse trace.sessionId / trace.userId population. The Langfuse mapping now populates Langfuse's two dedicated cross-trace grouping fields from data OA already carries: trace.sessionId is sourced from openarmature.session_id (set when the invocation is session-bound per the sessions capability), so a multi-turn agent's per-turn invocations group into one Langfuse Session; trace.userId is promoted automatically by the Langfuse observer from a recognized userId key in the caller-supplied invocation metadata (§3.4) — additive, the key also remains at trace.metadata.userId, and userId is recognized, not reserved. Both fields are unset when their source is absent. A new Session / user trace-field sourcing paragraph defines the MUST-set / unset rules, the multi-invocation / detached / suspend-resume grouping semantics, and the OTel data-model asymmetry (no OTel trace-level session/user field; no OTel-side change). The asymmetry in where the two sources live is principled: session_id is a first-class OA concept with state semantics, while a user id has no runtime semantics and is promoted observer-side from caller metadata rather than added to the engine's invoke surface. (proposal 0064)
One new conformance fixture observability/conformance/084-langfuse-session-user-promotion (five cases: session-bound / not, userId present / absent, multi-invocation grouping under one session id).

Changed

observability §8.10 — the Langfuse Sessions out-of-scope bullet is removed (realized; the sessions capability, proposal 0020, is Accepted). §8.1 and the §8.4 Distinction from Langfuse Sessions / Users note are updated to record that Sessions / Users grouping is now realized rather than deferred. Langfuse Scoring and Cost remain deferred. (proposal 0064)

Notes

MINOR bump (pre-1.0). Behavior change for one path: a caller already supplying userId as invocation metadata (landing only in trace.metadata.userId today) will, after this, also see it populate the first-class trace.userId field — almost always the desired outcome, and the reason userId is recognized rather than reserved. Callers working around the gap via direct Langfuse SDK trace-update calls will see OA-observer values appear on the same fields (write order determines the final value, as for trace.input / trace.output); the migration is to drop the direct calls. No OTel-side change, no graph-engine or public-type change. (proposal 0064)

[0.61.0] — 2026-06-17¶

Changed

observability §4.4 — a detached OTel trace now roots in an openarmature.invocation span. A detached subgraph or fan-out (§4.4 detached trace mode) renders its separate trace rooted in its own openarmature.invocation span carrying the same invocation_id as the parent invocation, with the detached unit's spans nested under it — replacing the prior "spans use the new trace_id as their root, not children of any invocation span" shape. The detached invocation span opens / closes on the detached-unit window (§4.1) and carries the detached unit's own status (§4.2 — a raising detached subgraph surfaces ERROR on both the parent dispatch span and the detached invocation span, each with the §4 category + an exception event). invocation_id is the shared engine-level run identity (detached mode is observer-side trace rendering, not an engine-level sub-invocation); trace_id is the per-backend rendering identity (§4.3 Detached-dispatch invocation spans). This lets the §5.1 always-emit attribution invariant apply to every detached trace with no per-context caveat (§5.1 / §4.5 multiple-invocation-spans-per-run notes), and reconciles the contradicting expected span trees in conformance fixtures 008-otel-detached-trace-mode and 058-implementation-attribution-otel. (proposal 0061)

Notes

MINOR bump (pre-1.0). Operator-visible: anything snapshotting detached-trace OTel output sees a new invocation-span layer at the detached trace root after upgrade. The reference implementation's OTel observer synthesizes the detached invocation span at each detached trace root from the invocation_id it already sees on every event — no graph-engine change, no public type or interface change. The Langfuse side is unchanged (the Trace entity already plays the invocation-level-container role; §8.4.1 gains a clarifying note only). (proposal 0061)

[0.60.0] — 2026-06-17¶

Added

conformance-adapter §5.1 — failure-mock directive catalog. Documents the five failure-injecting node mocks the retry / failure-isolation / checkpoint-resume fixtures already use — flaky (sequence + compact forms) and the failure_sequence entry, flaky_by_index, flaky_per_index, flaky_instance_only, flaky_resume_aware — each by the failure axis it keys on (per-attempt / per-invocation / deterministic), plus the flaky_per_index (invocation-keyed) vs flaky_by_index (attempt / deterministic) disambiguation, and a flag (not a change) on the success_update / on_success / success_compute success-state field-naming drift. (proposal 0071)

Notes

MINOR bump (pre-1.0). Descriptive only — the directives the adapter already implements and ~40 existing fixtures exercise; no behavior change, no new or changed fixtures, no new conformance expectation. A PATCH classification is defensible (purely documentary); MINOR is the maintainer's call. (proposal 0071)

[0.59.0] — 2026-06-16¶

Changed

pipeline-utilities §9.3 — an omitted extra_outputs source is a positional null slot. A FailureIsolation-degraded fan-out instance whose degraded_update omits an extra_outputs subgraph_field now contributes null at that instance's slot (merged in instance-index order, mirroring an omitted collect_field), rather than "not contributed." This keeps the field index-aligned with target_field under extending reducers. Supersedes 0066's "not contributed / like a skipped heterogeneous branch field" clause for this case only; §9.8's cross-reference to the rule is updated to agree. (proposal 0069)

Added

pipeline-utilities §9.3 — an absent collect_field is graceful on every fan-in path. A non-degrade instance_middleware return SHOULD cover collect_field; an absent collect_field (by any route) yields a null slot and the fan-in MUST NOT raise — generalizing §9.8's degrade-never-raises so a non-conformant return surfaces as a visible null in target_field rather than stopping the graph under fail_fast (for null-tolerant reducers such as append; strict-element reducers like concat_flatten / merge_all require the field be supplied). (proposal 0069)
One new conformance fixture pipeline-utilities/conformance/069-fan-out-degrade-refinements — the extra_outputs null slot, the absent-collect_field no-raise, and a degrade-survives-resume round-trip (via the crash_injection directive from 0070).

Notes

MINOR bump (pre-1.0). Correctly-configured graphs (degrades that supply their mapped fields) are unchanged. Supersedes only 0066's §9.3 extra_outputs-omission clause; the 0066 degrade-is-the-contribution model, the §9.8 collect_field compile check, and the §11.7 branch skip stand. (proposal 0069)

[0.58.0] — 2026-06-16¶

Added

conformance-adapter §5.6 — crash_injection directive. Simulates a crash at a checkpoint boundary independent of an instance failure (after_node or after_fan_out_instance), so resume is testable from any saved state — including a fan-out where instances completed (e.g. FailureIsolation-degraded instances, which complete rather than propagate). Pairs with resume: the same way first_run_expected_error does. (proposal 0070)
conformance-adapter §5.1 — cause on failure mocks. An optional recursive cause on the error a failure mock raises, chaining it to an originating cause so a fixture can exercise a multi-link non-carrier cause chain (walked by pipeline-utilities §6.3's failure-isolation event). (proposal 0070)
conformance-adapter §5.6 / §5.8 — crash/resume vocabulary formalized (descriptive of existing adapter behavior): first_run_expected_error, resume / from_first_run, saved_record_assertions / fan_out_progress, and instances_executed_during_resume / instances_skipped_during_resume. (proposal 0070)
Two new conformance fixtures: pipeline-utilities/conformance/067-crash-injection-fan-out-resume (crash after a completed instance's save → resume rolls it forward, the remaining instance runs) and 068-failure-mock-cause-chain (a cause-chained flaky failure → the outermost-non-carrier-wins derivation, the two-non-carrier-link case fixture 066 left open).

Notes

MINOR bump (pre-1.0). Test-vocabulary only — no capability behavior changes. The formalized crash/resume directives codify what conformant adapters already implement (the existing checkpoint-resume fixtures exercise them); crash_injection and mock cause are new adapter capabilities a conformant adapter MUST add. (proposal 0070)

[0.57.0] — 2026-06-15¶

Added

pipeline-utilities §6.3 — structured cause chain on the failure-isolation event. caught_exception gains a chain: an ordered list of cause links {category, message, carrier} from the caught exception (outermost) down to the originating raise (innermost), with graph-engine §4 node_exception carriers flagged carrier: true. The full provenance is preserved rather than collapsed to a single value. (proposal 0068)
One new conformance fixture pipeline-utilities/conformance/066-failure-isolation-cause-chain — an instance-site single-carrier chain, a node-level single-non-carrier link, and an uncategorized-cause null derivation.

Changed

pipeline-utilities §6.3 — caught_exception.category / message are now a derivation over the chain. The derived category is the outermost non-carrier link whose category is a non-empty string (else null); the derived message is that link's message (else the outermost non-carrier link's). This supersedes 0065's "resolve through the carrier wrapper to the originating cause" prose — the derived category reproduces 0065's single-carrier values (fixture 064 is unchanged) and resolves the previously-ambiguous multi-non-carrier case (the outermost categorized link wins, so a deliberate surface re-categorization is reported). 0065's wrapped-instance / branch lineage SHOULD is unaffected; §6.1 retry classification is unchanged. (proposal 0068)

Notes

MINOR bump (pre-1.0). The caught_exception shape gains chain (additive); the derived category / message are unchanged for the single-carrier chains 0065 fixtured. 0065's caught_exception.message coherence SHOULD is replaced by a definitional derivation — the derived message is the same chain link the category is taken from — which the chain makes unambiguous. Supersedes only 0065's caught_exception cause-representation clause. The deliberate §6.1 / §6.3 resolution asymmetry is recorded in docs/open-questions.md.

[0.56.0] — 2026-06-15¶

Added

pipeline-utilities §9.8 Fan-out degrade slot coverage — compile-time collect_field check. A new compile-time error category fan_out_degraded_update_missing_collect_field (reported per the graph-engine §2 compile-time error contract): when a fan-out node's instance_middleware includes a FailureIsolationMiddleware whose degraded_update is a static mapping, the graph is rejected at compile time if that mapping omits collect_field. Because a degraded instance's contribution is the degraded_update (§9.3), a static omission would otherwise leave a silent null in the homogeneous collection; the check catches it at construction. The category is defined in §9.8 (as parallel_branches_no_branches is defined in §11.9). The callable degraded_update form is not compile-checkable; at runtime an omitted collect_field yields a null slot gracefully — the degrade path never raises (a runtime raise would convert the isolation into a graph-stopping failure under fail_fast). (proposal 0066)
One new conformance fixture pipeline-utilities/conformance/065-fan-out-failure-isolation-degrade-contribution — four cases: instance degrade fills the slot + reads extra_outputs by subgraph field name; static collect_field omission → compile error; callable omission → null slot (no stop); parallel-branches branch degrade skips an uncovered projected field.

Changed

pipeline-utilities §9.3 — a FailureIsolation-degraded fan-out instance's contribution is its degraded_update. A degraded instance completes as a §9.3 success (slot omission, §9.5, is for genuinely-failed instances only), and its projected contribution is the degraded_update: collect_field and each extra_outputs subgraph_field are read from the degraded_update mapping by subgraph field name, not merged onto the instance's pre-failure subgraph state. Resolves an interaction §9.3 left implicit (the degraded_update-is-the-contribution model vs a merge-onto-pre-failure-state model). (proposal 0066)
pipeline-utilities §11.7 — branch-middleware degrade skip confirmed. A FailureIsolation branch middleware whose degraded_update omits a projected outputs field contributes nothing for it (the parent keeps its prior / sibling value), per §11.4's heterogeneous buffer-then-merge — the deliberate counterpart to the homogeneous fan-out slot-coverage rule (§9.8).

Notes

MINOR bump (pre-1.0). The new behavior is the compile-time collect_field-coverage error for static degraded_updates on fan-out instance middleware. The §9.3 / §11.7 changes pin previously-implicit contribution semantics (degrade = success → the degraded_update is the contribution; heterogeneous branch skip); runtime behavior for correctly-configured graphs is unchanged, and the degrade path never raises.

[0.55.1] — 2026-06-11¶

Fixed

observability §11 — span-links Out of scope bullet reconciled with §4.3 / §4.4. The bullet stated "OTel span links between traces … defer until needed," which contradicted the spec body: §4.4 detached-trace mode already has the parent's dispatch span carry an OTel Link to each detached child trace, and §4.3 suspend-resume SHOULDs an observer span-link from the resume invocation span to the suspended one. The bullet is narrowed to scope out only span-link patterns the spec does not model (e.g., many-to-one fan-in accumulation across many traces). Clarification only — no behavior, type, or conformance change; PATCH with no proposal per GOVERNANCE's When a proposal is required.

[0.55.0] — 2026-06-11¶

Changed

§6.3 failure-isolation event — cause fidelity at carrier-wrapper sites. caught_exception.category MUST now reflect the originating failure when FailureIsolationMiddleware runs at a non-node placement (§9.7 instance middleware, §11.7 branch middleware, or parent-node middleware on a fan-out / parallel-branches node per §9.6 / §11.6). At those sites the engine has already wrapped the originating error as a graph-engine §4 node_exception before the isolation middleware catches it; the middleware MUST resolve through the carrier wrapper to the originating cause (__cause__) and report that category — the same carrier-wrapper resolution §6.1's default classifier already mandates. Previously the event surfaced the masking node_exception at those sites, hiding the real cause (e.g. provider_unavailable). Node-level placement is already faithful and is unchanged; the catch/degrade behavior is unchanged at every site — only the event's reported cause changes. caught_exception.message SHOULD track the resolved cause (category/message coherence), and the event's wrapped-instance/branch lineage SHOULD resolve to the isolated instance/branch where recoverable. (proposal 0065)

Added

One new conformance fixture under spec/pipeline-utilities/conformance/: 064-failure-isolation-cause-fidelity-at-wrapping-sites — three cases asserting the carrier-wrapper unwrap MUST at the §9.7 instance site (category resolves to provider_unavailable, not node_exception), the §11.7 branch site (resolves through the branch's node_exception), and an uncategorized originating cause (category == null). Node-level placement remains covered by fixture 061. Message coherence and wrapped lineage are SHOULDs, not strictly asserted.

Notes

MINOR bump (pre-1.0). The change tightens the emitted caught_exception contract at the non-node wrapping sites (a change to conformance expectations), so it is classified MINOR rather than a textual PATCH. No graph execution outcome changes — the middleware still catches and degrades identically; only the event's reported cause is corrected.
Accept-phase correction to the proposal's §11.7 framing. The proposal's draft text characterized the §11.7 branch-middleware site as catching the engine's parallel_branches_branch_failed wrapper. Verified against §11 at acceptance: branch middleware wraps the branch's subgraph invocation and catches the inner node's plain node_exception (a single carrier wrapper); parallel_branches_branch_failed is raised at the parallel-branches node level (§11.9) and is the parent-node-middleware (§11.6) site's concern. The spec text and the proposal's clause were generalized to "any node_exception carrier wrapper at any non-node placement" so the rule covers all sites uniformly.

[0.54.0] — 2026-06-09¶

Added

retrieval-provider capability — first non-LLM-completion provider capability. Lands a new capability sitting alongside llm-provider (proposal 0006) covering retrieval-primitive provider operations. Inherits llm-provider's per-model-binding contract, error-category enumeration (§7), and typed-response shape conventions; does NOT extend llm-provider's Provider protocol — the protocols defined here are siblings, not subtypes. Disjoint per-model-binding semantics from LLM completion (embedding model identifiers and completion model identifiers live in disjoint namespaces; a single Provider abstraction bundling both surfaces would either contradict the per-model-binding contract OR carve a different shape for the same protocol). Retrieval-provider is one of a planned family of <domain>-provider capabilities (llm-provider, retrieval-provider, plus future siblings as downstream demand surfaces — e.g., voice-provider for ASR + TTS, multimodal-provider for image generation + image edit). (proposal 0059)
EmbeddingProvider protocol — first protocol surface on retrieval-provider. A two-operation async protocol: ready() (idempotent readiness check; surfaces provider_invalid_model / provider_model_not_loaded if the bound embedding model is not available) and embed(input: list[str], *, config?) -> EmbeddingResponse (stateless; per-instance model binding per retrieval-provider §3 / llm-provider §5; input MUST be a list even for single-string callers; input order MUST be preserved in the response). EmbeddingResponse record carries vectors, model, usage (an EmbeddingUsage record with input_tokens only — no output_tokens because vectors aren't tokens), response_id, dimensions, and raw (the parsed provider response per llm-provider §6 Response.raw pattern). Cross-impl invariants: vector count MUST equal input count; all vectors in a single response share dimensionality; dimensions field MUST equal inner-vector length; violations raise provider_invalid_response. Inherits the llm-provider §7 error-category enumeration; embedding-applicable subset (the §7 categories minus provider_unsupported_content_block and structured_output_invalid) documented in retrieval-provider §5. Sibling RerankProvider protocol scoped to a forthcoming proposal extending the same capability.
graph-engine §6 observer event union — two new typed event variants EmbeddingEvent + EmbeddingFailedEvent. Paired from launch per the 0049 → 0058 success+failure pairing precedent (avoids the "success-only then add failure later" split that cost 0049 a second release cycle). Both variants carry the identity / scoping / request-side field set established by LlmCompletionEvent post-0057 (with input_strings in place of input_messages and the embedding-specific runtime-config shape). EmbeddingEvent carries capability-appropriate success-side fields (response_id, response_model, usage, dimensions, input_count); EmbeddingFailedEvent carries the three failure-specific fields (error_category, error_type, error_message) per the 0058 pattern. Mutual exclusion + exception-flow + dispatch-timing rules mirror the LLM-side pair — the §7 category exception still raises out of embed(); the typed event is dispatched alongside the exception, not in place of it.
observability §5.5.8 Embedding provider attributes — OTel mapping for embedding spans. A new sub-subsection paralleling §5.5 LLM provider attributes. Span name openarmature.embedding.complete discriminates the operation type from openarmature.llm.complete without requiring an explicit operation-name attribute. Stable GenAI semconv attribute subset (gen_ai.system, gen_ai.request.model, gen_ai.response.model, gen_ai.response.id, gen_ai.usage.input_tokens) plus OA-namespace openarmature.embedding.* attributes (input_count, dimensions, payload-gated input.strings + request.extras). The upstream gen_ai.operation.name attribute (with "embeddings" as a documented well-known value) is at Development status as of v0.54.0; per the Stable-only upstream adoption policy, OA does NOT normatively adopt it — operation discrimination is via the span name + provider. A follow-on proposal MAY add it when upstream reaches Stable per the §5.5.3.1 / 0047 mirror pattern. New §5.5.9 Typed embedding events sub-subsection frames the typed EmbeddingEvent + EmbeddingFailedEvent surface as the structured form of the embedding-span attribute surface, paralleling §5.5.7 for LLM completion events.
observability §8.4.5 Embedding-specific mapping — Langfuse mapping for embedding observations. Embedding calls map onto Langfuse's dedicated Embedding observation type (created via the SDK's asType: "embedding"), NOT Generation with an operation discriminator. Verified against current Langfuse docs at proposal draft time — Langfuse exposes 10 observation types currently (Event, Span, Generation, Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail); the dedicated type carries embedding-specific semantics (model, usageDetails.input, input strings, output vectors) and integrates with Langfuse's cost-tracking machinery directly. Field mappings cover the embedding observation's model / input / output / usageDetails.input / metadata surface. Trace-level cost rollup aggregates Generation + Embedding observations uniformly via the per-observation usageDetails field; no metadata discriminator needed.
Privacy posture for embedding observations — vec2text-aware. Both input strings and output vectors are payload-bearing data on the same footing — both gated by disable_provider_payload (default True per §5.5.4). Vectors are classified as payload-bearing because embedding-inversion research (e.g., the vec2text line of work, Morris et al., 2023) demonstrates that vectors MAY leak source-text information given the embedding model. The threat model for vectors is equivalent to the threat model for raw text from the spec's perspective; gating applies uniformly. RAG applications in particular have a corpus-leakage concern — the (text, vector) pairs accumulated in traces would let an attacker reconstruct the embedding index and query it offline. A future observability proposal MAY introduce a tiered preview mode (truncated input strings + first-N-dimensions vectors) for partial-visibility use cases; out of scope for v0.54.0.
Five new conformance fixtures under spec/retrieval-provider/conformance/: 001-embed-positive-control (response-shape invariants on a 3-vector / 4-dim mocked response), 002-embed-model-binding-error (unknown model surfaces provider_invalid_model), 003-embed-malformed-response-mismatched-vector-count (3 inputs / 2 vectors raises provider_invalid_response), 004-embed-malformed-response-inconsistent-dimensions (vectors with inconsistent inner lengths raise provider_invalid_response), 005-embed-input-order-preserved (vector position keyed by input order; adapter MUST NOT permute). Introduces per-directory harness directives calls_embed: (node-level) and mock_embedding: (suite-level) per conformance-adapter §3.2.
Ten new conformance fixtures under spec/observability/conformance/: 074-embedding-event-dispatch (typed-event dispatch contract on a successful call), 075-embedding-failure-event-dispatch-on-provider-unavailable (failure-side dispatch + exception-flow preservation on a provider_unavailable raise), 076-embedding-event-mutual-exclusion (success and failure variants mutually exclusive on a given call), 077-embedding-event-call-id-distinct (per-call call_id mint contract across multiple embed() calls), 078-embedding-event-input-strings-populated (input_strings field carries the input list verbatim), 079-embedding-event-request-params-populated (caller-supplied dimensions populates; absence-is-meaningful when no config supplied), 080-embedding-event-input-count-and-dimensions-populated (convenience fields match input list length + inner-vector length), 081-embedding-event-active-prompt-populated (RAG retrieval-template scenario; active_prompt snapshot populated from prompt-context binding), 082-otel-embedding-span-attributes (span name openarmature.embedding.complete, Stable GenAI semconv subset + OA-namespace embedding attributes; gen_ai.operation.name absent), 083-langfuse-embedding-observation (dedicated Embedding observation type, NOT Generation; disable_provider_payload gates both input and output symmetrically).

Changed

Cross-spec rename: disable_llm_payload → disable_provider_payload. The observer-level privacy flag defined at observability §5.5.4 is renamed. Semantics broaden to cover payload from any provider call (LLM completion + embedding + rerank when it lands), under a single observer-level flag with default True (suppressed by default) — same default-conservative posture as before. No semantic change beyond the broadened scope; existing LLM-payload gating behavior is preserved unchanged for LlmCompletionEvent + LlmFailedEvent. Spec text edits: observability §5.5.4 renames the flag definition + extends framing; observability §8.7 / §8.9 references updated to the new name; graph-engine §6's LlmCompletionEvent + LlmFailedEvent privacy paragraphs updated; existing observability fixtures using the flag in their YAML (012, 013, 014, 015, 018, 022, 023) updated to the new key. Pre-1.0 SemVer convention permits the hard-swap rename in a MINOR bump; same precedent as proposal 0057's request_id → response_id field rename. Downstream observer config with disable_llm_payload=True requires a one-key update to disable_provider_payload=True to adopt v0.54.0; no behavioral change beyond the name. (proposal 0059)

Notes

Accept-phase correction: EmbeddingResponse.request_id → response_id. Proposal 0059's text named the response-object field request_id while the corresponding EmbeddingEvent typed-event field was named response_id; the spec body adopts response_id on the response object as the consistent shape (the field carries a provider-returned RESPONSE identifier, matches the OTel GenAI semconv gen_ai.response.id attribute, and aligns the response-object and typed-event field names). The Langfuse metadata key embedding.metadata.openarmature_request_id similarly becomes openarmature_response_id. The proposal text remains as-Accepted (immutable per governance); this CHANGELOG note records the deviation.
MINOR bump (pre-1.0). New capability + new typed event variants + new OTel/Langfuse mapping subsections are purely additive at the spec level. The one breaking change is the cross-spec flag rename above; pre-1.0 SemVer convention permits the hard-swap rename in a MINOR bump (same posture as proposal 0057's request_id → response_id rename). The fixture-set scope of the rename is broader than proposal 0059's text described — the proposal said "fixtures 013 + 018" but the actual cohort exercising the flag in YAML body is five fixtures (013, 014, 015, 018, 023) plus two additional fixtures (012, 022) referencing the flag in YAML comments only; all seven are migrated to the new flag name to avoid a partial-rename state. Observers consuming only LlmCompletionEvent / LlmFailedEvent continue to work unchanged at the typed-event surface; one observer-config key change is required to adopt v0.54.0 (disable_llm_payload → disable_provider_payload). With this proposal, OpenArmature gains its first non-LLM-completion provider capability and the first sibling protocol pattern (EmbeddingProvider alongside Provider) for future <domain>-provider capabilities to follow.

[0.53.0] — 2026-06-08¶

Added

LlmFailedEvent — second typed event variant on the observer event union. Carves LLM provider failures into a spec-normatively-typed event variant alongside LlmCompletionEvent (proposal 0049 / 0057). Mirrors LlmCompletionEvent's identity / scoping / request-side field set 1:1 (17 fields — invocation_id, correlation_id, node_name, namespace, attempt_index, fan_out_index, branch_name, provider, model, latency_ms, caller_invocation_metadata, input_messages, request_params, request_extras, active_prompt, active_prompt_group, call_id) plus three failure-specific fields: error_category (always-present; one of the llm-provider §7 normative categories — provider_authentication, provider_unavailable, provider_invalid_model, provider_model_not_loaded, provider_rate_limit, provider_invalid_response, provider_invalid_request, provider_unsupported_content_block, structured_output_invalid), error_type (OPTIONAL impl-level / vendor-specific error type or code — null when no impl-side type is available; two acceptable styles: vendor error code or upstream exception class name), error_message (always-present human-readable message from the raised exception; empty string when the exception carried no message). Response-side fields (response_id, response_model, usage, output_content, finish_reason) are absent from the failure variant — no response was received. (proposal 0058)
Mutual exclusion + exception-flow preservation. The two variants are mutually exclusive on a given provider.complete() call; implementations MUST NOT emit both for the same call (codified in graph-engine §6 + locked down by a dedicated conformance fixture). The provider exception-flow contract is preserved per proposal 0049's alternative-3 framing — failures still raise the §7 category exception out of provider.complete(); the typed event is dispatched on the observer delivery queue alongside the exception, not in place of it. Lands the LlmCallFailedEvent typed-variant follow-on proposal 0049's Out of scope section anticipated by name.
graph-engine §6 Typed LLM completion event + observability §5.5.7 — extended with the failure-side variant. A new Typed LLM failure event paragraph block in graph-engine §6 (alongside the existing Typed LLM completion event paragraph from proposal 0049) carries the LlmFailedEvent field table and dispatch contract. A new Typed LLM failure event paragraph in observability §5.5.7 frames the failure-side typed event as the structured form of LLM-call failure observability and notes that with both LlmCompletionEvent and LlmFailedEvent defined, the impl-current sentinel-namespace NodeEvent convention for LLM observability can retire fully across both outcome sides.
Five new conformance fixtures under spec/observability/conformance/: 069-llm-failure-event-dispatch-on-provider-unavailable (dispatch contract for the canonical transport-error category; verifies the exception still raises out of provider.complete() alongside the typed event), 070-llm-failure-event-dispatch-on-provider-invalid-request (companion variant on a pre-send-validation category to verify dispatch consistency across §7 categories), 071-llm-failure-event-call-id-distinct-from-completion-event (per-call call_id mint contract — a failed call gets its own fresh call_id, distinct from any success-event call_id from a different call in the same invocation), 072-llm-failure-event-mutual-exclusion-with-completion-event (dedicated lockdown of the mutual-exclusion rule — exactly one LlmFailedEvent and exactly zero LlmCompletionEvent on a failed call), 073-llm-failure-event-error-type-vendor-specific (3-case fixture covering the two acceptable error_type styles — vendor error code and upstream exception class name — plus the null companion when no impl-side type is available).

Notes

MINOR bump (pre-1.0). Purely additive at the spec level — the new event variant adds a second discriminator-on-type slot on the observer event union; LlmCompletionEvent's field set and dispatch contract from proposal 0049 / 0057 are unchanged. Observers consuming only LlmCompletionEvent continue to work unchanged; observers wanting to consume the failure event opt in via type discrimination on the new variant. Existing fixtures (001-068) continue to pass unchanged. With this proposal, the LLM-call observability surface has full type-discriminated coverage on both outcome sides — implementations adopting both LlmCompletionEvent and LlmFailedEvent consumption have a path to retiring the sentinel-namespace NodeEvent convention entirely.

[0.52.0] — 2026-06-07¶

Added

Three new canonical reducers. Extends the graph-engine §2 baseline reducer set with bounded_append(max_len), dedupe_append(key=None), and merge_by_key(key) — factory-style closures that solve common state-composition patterns without forcing users to roll their own. bounded_append extends a list and truncates from the front (oldest-first eviction) when the post-merge length exceeds max_len; the bound applies to the post-merge length so an update larger than max_len keeps only the last max_len items and the prior list is fully evicted. dedupe_append extends a list with items from the update whose key (callable, or the item itself when omitted) is not already present in the existing list OR earlier in the same update; first-occurrence-wins for in-update duplicates; existing order preserved. merge_by_key is a list-of-records keyed merge — items in the update with a key matching an existing entry REPLACE that entry in place; items with novel keys are appended at the end in update order. The chat-history-with-LLM-summarization pattern is deliberately NOT a reducer concern — reducers stay pure synchronous functions per the contract; summarization-before-truncation lives in a compaction node or middleware (the patterns docs cookbook carries the canonical recipe). (proposal 0023)
graph-engine §2 Reducer paragraph — extended canonical set. The Reducer paragraph grows from 5 required canonical reducers (last_write_wins, append, merge, concat_flatten, merge_all) to 8 by adding the three new factory reducers. Three new semantics paragraphs (bounded_append, dedupe_append, merge_by_key) sit alongside the existing concat_flatten and merge_all semantics paragraphs, documenting cross-impl semantics (truncation direction, key callable behavior, novel-key handling) so Python and TypeScript implementations agree on per-reducer behavior.
conformance-adapter §5.2 — state.fields.<field>.reducer directive extended for factory reducers. The directive's type extends from string-only to "string OR single-key mapping": the existing string form names a parameter-less canonical reducer; the new single-key mapping form {<factory_name>: <kwargs_mapping>} names a canonical factory reducer with its construction kwargs (e.g., {bounded_append: {max_len: 3}}). A sub-clause covers the key callable convention — the YAML expresses the key as a field-name string and the adapter constructs the language-idiomatic accessor. This extends the established directive's type to accommodate proposal 0023's factory reducers rather than introducing the shape as a per-fixture-suite extension (per §3.2), since the shape applies cross-suite.
graph-engine §2 — reducer_configuration_invalid compile-time error category. A new compile-time error category alongside the existing six (no_declared_entry, unreachable_node, dangling_edge, multiple_outgoing_edges, conflicting_reducers, mapping_references_undeclared_field). Raised at field registration / graph compilation time when a reducer factory is supplied invalid construction parameters (e.g., bounded_append(max_len=0), merge_by_key(key=None)). Distinct from conflicting_reducers, which is about the reducer-declaration shape across multiple reducers on the same field; the new category is about parameters supplied to a single reducer factory. Runtime contract violations (non-list update to a list reducer, key callable raising on a specific item) continue to surface as reducer_error per §4, unchanged.
Five new conformance fixtures under spec/graph-engine/conformance/ covering ten total test cases per the proposal's intent: 034-reducer-bounded-append (3 cases: basic truncation, multi-step cumulative bound, update-larger-than-max_len evicts prior entirely), 035-reducer-dedupe-append (2 cases: default-key in-update + cross-prior dedup, key-callable dedup on records), 036-reducer-merge-by-key (3 cases: replace-existing preserves position, append-novel at end in update order, mixed replace + append), 037-reducer-configuration-invalid-max-len (1 case: bounded_append(max_len=0) raises at compilation), 038-reducer-error-non-list-update (1 case: non-list update to bounded_append raises ReducerError at merge time). The fixture YAML introduces a per-suite extension to the reducer: directive: in addition to the string form for parameter-less reducers, factory reducers use a single-key mapping {<factory_name>: <kwargs>}; for factory key callables, the YAML expresses the key as a field-name string that the adapter constructs as the language-idiomatic accessor.

Notes

MINOR bump (pre-1.0). The three new reducers are additive — every existing field with append / merge / last_write_wins / concat_flatten / merge_all continues to work unchanged. The new reducer_configuration_invalid error category sits alongside the existing six rather than renaming or repurposing any. The proposal text (drafted 2026-05-17) quoted the §2 baseline as 3 reducers — between drafting and Accept, the baseline grew to 5 (proposal 0036 added concat_flatten and merge_all); the Accept-phase math is 5 existing + 3 new = 8 canonical reducers, not the proposal-quoted "6." The proposal's design intent (add three new canonical reducers for the documented patterns) is unchanged; only the count shifted to match the current baseline. The chat-agent and tool-loop motivating patterns become more uniform across language siblings; cross-impl agreement on per-reducer semantics (truncation direction, post-merge-vs-update bound, key-callable behavior, novel-key handling) prevents per-implementation drift.

[0.51.0] — 2026-06-07¶

Added

LlmCompletionEvent field-set extension. Extends the typed event introduced at v0.41.0 (proposal 0049) with eight additive fields completing the typed event's coverage of the observability §5.5 LLM provider span attribute surface and the prompt-identity attribute family (per prompt-management §12 / observability §8.4.4). The new fields: input_messages (list of message records per llm-provider §3, populated unconditionally with observer-side privacy gating at rendering — symmetric with how §5.5.1 + §5.5.4 work today; empty list when the call had no history); output_content (string, nullable on tool-call-only assistant messages per llm-provider §6); request_params (mapping carrying the §5.5.2 GenAI request-parameter family with absence-is-meaningful semantics — only caller-supplied parameters appear in the mapping; empty mapping when none supplied); request_extras (mapping carrying the RuntimeConfig extras pass-through bag in typed-event-native form, not the JSON-encoded string form §5.5.1 emits on the OTel span); active_prompt (record carrying the 5-field prompt-identity snapshot — name, version, label, template_hash, rendered_hash — at LLM-call time, sourced from the implementation's prompt-context binding mechanism; null when the call ran outside any prompt-context binding); active_prompt_group (record carrying {group_name} when the call ran inside a PromptGroup context; null otherwise); call_id (per-call disambiguator minted by the implementation; always present, freshly minted per provider.complete() call, stable for the call's lifetime, unique within the implementation's run; wire shape unconstrained — distinct from response_id which is the provider-returned identifier); response_model (provider-returned model identifier per gen_ai.response.model per §5.5.3; distinct from the existing model which carries the requested identifier — providers MAY return a more specific identifier than requested). Lands the alternative-5 follow-on proposal 0049 anticipated; observer demand surfaced as observers consuming the §5.5 attribute surface need typed-event equivalents during the sentinel-namespace NodeEvent migration. (proposal 0057)
graph-engine §6 Typed LLM completion event + observability §5.5.7 Typed LLM completion event — extended. The graph-engine §6 field table grows from 14 to 22 typed fields (8 additive + 1 rename). The observability §5.5.7 framing paragraph extends to acknowledge the typed event now mirrors the §5.5.1 payload attributes, the §5.5.2 GenAI request-parameter family, the prompt-identity attribute family per prompt-management §12 / §8.4.4, and the gen_ai.response.model attribute in addition to the §5.5.3 response-side attributes it covered at v0.41.0. A new paragraph documents the disable_llm_payload rendering-boundary gate semantics — equivalent typed-event fields are populated unconditionally; observers gate at rendering identically to the span attribute path.
Nine new conformance fixtures under spec/observability/conformance/: 060-llm-completion-event-input-messages-populated (input_messages carries the spec-canonical §3 message-list shape), 061-llm-completion-event-output-content-populated (positive + tool-call-only null companion), 062-llm-completion-event-request-params-populated (absence-is-meaningful — only caller-supplied parameters appear), 063-llm-completion-event-request-extras-populated (native mapping form, not JSON-encoded), 064-llm-completion-event-active-prompt-populated (5-field identity record sourced from prompt-context), 065-llm-completion-event-active-prompt-null (null when call ran outside prompt-context), 066-llm-completion-event-active-prompt-group-populated (record sourced from PromptGroup context — introduces the per-directory renders_prompt_group: directive), 067-llm-completion-event-call-id-always-present-and-distinct (3-call sequence asserting non-null and pairwise-distinct values), 068-llm-completion-event-response-model-distinct-from-request (distinct + null companion).

Changed

LlmCompletionEvent.request_id → LlmCompletionEvent.response_id. The field name on the typed event is renamed to match the response-side data it carries (gen_ai.response.id per observability §5.5.3). The data, source, and nullability are unchanged. Pre-1.0 SemVer convention permits the rename in a MINOR bump; the rename rides on the same proposal that extends the field set because the field table is already being edited. Fixture 050-llm-completion-event-dispatch updated for the rename (YAML expectation + 3 .md companion references); other 0049 fixtures (051-056) do not assert on this field and need no edit.

Notes

MINOR bump (pre-1.0). The eight new typed-event fields are additive — every existing LlmCompletionEvent consumer continues to work, with the new fields available as opt-in reads. The request_id → response_id rename is a breaking change on the field-name surface that the pre-1.0 SemVer convention (documented in this CHANGELOG's header) permits in a MINOR bump; the rename's window of practical breakage is limited because the typed event itself only landed at v0.41.0. Privacy-bearing fields (input_messages, output_content, request_extras) are populated unconditionally on the typed event with observer-side rendering gates honoring disable_llm_payload identically to the §5.5.1 span attributes — symmetric with how the OA observability stack handles payload-bearing data today. Composes with the Response.usage.cached_tokens field shipped at v0.47.0 (proposal 0047) — that field continues to flow through the typed event's existing usage record unchanged; no cache-specific typed-event fields needed.

[0.50.0] — 2026-06-06¶

Added

New capability: harness-chat. First per-harness-type sub-spec landing on top of the abstract harness capability (proposal 0022). Ratifies the chat-loop deployment shape: the canonical ChatMessage shape (mirrors llm-provider §3 unchanged — role + content: str | list[ContentBlock] with text / image / thinking / redacted_thinking blocks + top-level tool_calls on assistant + tool_call_id on tool-role messages, no chat-specific content-block types), the conversation-history convention (per-session messages: list[ChatMessage] field with append reducer; v1 mandates the canonical field name), the higher-level send(session_id, message) -> ChatTurnOutcome callable surface (three-way outcome discriminator: completed / errored / suspended; MUST-serialize concurrent calls under one session_id — stricter than sessions §8.1's LWW default because chat-shaped interleaving is user-visible), the inbound message → session → invoke wiring (dispatch-path classification: empty history → harness §3.1, non-empty → §3.2, signal-resume → §3.3 via subscribed-listener — send() is not the entry point for resume), the outbound assistant message → response wiring (new-message tail extraction by pre-invoke history index, NOT content matching; all-roles inclusion for tool-loop sequences; graph-execution-order preservation; always-list ChatTurnOutcome.completed.replies: list[ChatMessage] shape with no .reply union), the suspension-composition pattern (pending-message via standard append reducer + suspend() — no chat-specific engine hook; subscribed-listener as the default post-resume delivery with optional synchronous-next-send() alternative), the streaming-send_streaming() API-surface contract (deferred implementation-contract to the planned streaming proposal), and the three-bucket error → user-facing reply mapping (session_terminating / retryable_transient / user_correctable per harness §7's buckets, with a system-role reply on ChatTurnOutcome.errored). Sessioned-mode-only; cross-session memory deferred to a future memory capability per sessions §13's existing carve-out. Spec sections: §1 Purpose, §2 Concepts, §3 Message shape (mirroring llm-provider §3), §4 Conversation history convention, §5 The send() callable, §6 Inbound wiring, §7 Outbound wiring, §8 Composition with suspension (pending-message protocol + subscribed-listener resume), §9 Composition with streaming (forward-looking), §10 Error → user-facing reply mapping, §11 Determinism, §12 Cross-spec touchpoints, §13 Out of scope. New error category: chat_message_shape_invalid (routes through harness §7.3 user-correctable on surfacing; no change to harness §10's abstract error set). (proposal 0056)
harness §1 Specific harness types + §9 Per-harness-type implementations — concrete reference to the chat sub-spec. Both placeholder references (previously 0NNN-harness-chat) now point at proposal 0056 and spec/harness-chat/spec.md directly. Cross-spec edit; no behavior change at the harness layer.
Ten new conformance fixtures under spec/harness-chat/conformance/: 001-basic-send-and-reply (§5 + §6 + §7 canonical cycle; also carries the chat suite's per-directory contract documentation in its YAML header comment per conformance-adapter §3.2), 002-multi-turn-conversation (§4 append accumulation + §3.1 → §3.2 path-classification transition between turns), 003-multi-message-tool-loop (§3 + §7 — assistant with tool_calls → tool-role with tool_call_id + string content → final assistant text reply per llm-provider §3's canonical shape, all three on .replies in graph-execution order), 004-multimodal-user-message (§3 inbound with mixed text + image content blocks), 005-suspend-with-pending-message (§8.1 reducer-based pending message; no chat-specific engine hook), 006-suspend-resume-listener (§8.2 subscribed-listener fires exactly once with post-resume ChatTurnOutcome.completed), 007-error-session-terminating (§10.1 session_load_failed → harness §7.1 system-shaped reply), 008-error-retryable-transient (§10.2 provider_unavailable → harness §7.2 system-shaped reply), 009-error-user-correctable (§10.3 provider_invalid_request → harness §7.3 with diagnostic substring in reply), 010-error-chat-message-shape-invalid (§10.4 new category — malformed inbound message rejected at the send() API boundary before any session load).

Notes

MINOR bump (pre-1.0). Additive new capability layering on top of harness. Existing applications that do not use a chat harness see no behavioral change. Applications opting into the chat harness get a single ratified surface for the chat-loop deployment shape across cross-language implementations. Composes with sessions (session-bound history), suspension (pending-message + listener-resume), prompt-management + llm-provider (shared content-block model), and harness (abstract foundation). Six design decisions were resolved at draft (pending-message protocol → reuse existing graph primitives; multi-message reply ordering → graph-execution order, deterministic per harness §11; concurrent send() → chat-harness-level MUST-serialize; streaming forward-compat → API-surface contract / implementation contract split; capability naming → harness-chat parallels conformance-adapter for sub-spec naming; ChatTurnOutcome field shape → always-.replies: list[ChatMessage]) — captured in the proposal's Open questions section.

[0.49.0] — 2026-06-05¶

Added

New capability: harness. Defines the abstract behavioral contract that any harness implementation MUST follow when wrapping the OpenArmature workflow engine to serve a deployment runtime (HTTP server, event bus, queue worker, CLI repl, streaming connection, etc.). Specifies abstract turn semantics, dispatch path classification, composition rules with sessions and suspension, error categorization at the turn boundary, and the observability scope. Runtime-neutral — the contract does not assume request/response, event-driven, or any specific transport. Spec sections: §1 Purpose, §2 Concepts (Harness, Turn, Inbound dispatch path, Outbound surface, Session resolver, Signal coordinator, Harness mode — sessioned vs stateless), §3 Inbound dispatch paths (§3.0 stateless transmission path; §3.1 new-session; §3.2 existing-active-session; §3.3 signal-resume; §3.4 path classification), §4 Turn lifecycle, §5 Outcome handling (completed / errored / suspended; load-bearing rule that harness MUST NOT block on suspended turns), §6 Signal coordinator (suspend-time subscription + signal-arrival lookup; completeness requirement), §7 Error categorization at the turn boundary (session-terminating / retryable-transient / user-correctable three-bucket split), §8 Composition with capabilities (sessions: sessioned mode threads session_id, stateless mode omits; suspension; checkpointing — optional; observability), §9 Per-harness-type implementations (chat sub-spec committed to follow-on; others per-case), §10 Errors (harness_session_id_unresolved, harness_signal_subscription_failed, harness_signal_correlation_failed, harness_path_classification_ambiguous), §11 Determinism, §12 Cross-spec touchpoints, §13 Out of scope. Harness mode (sessioned vs stateless) is first-class: stateless mode is not a degenerate sessioned case with session_id = None — it has its own dispatch path (§3.0), its own sessions composition rule (§8.1), and skips all SessionStore interaction even when a SessionStore is registered on the compiled graph. (proposal 0022)
graph-engine §3 Execution model — Deployment-runtime wrapping paragraph. Cross-reference noting that invoke() is the per-call surface the harness capability wraps when an OpenArmature graph runs inside a deployment runtime. Notes the harness capability defines the abstract contract for inbound dispatch classification, turn-level outcome handling, signal coordination, error categorization, and the sessioned vs stateless mode distinction. The graph engine itself stays runtime-neutral.
observability §4 Span hierarchy — new §4.6 Turn-level wrapper span (harness capability). Optional wrapper span the harness MAY open around invoke() when running inside a deployment runtime. The invocation root span becomes a child of the turn span; the trace hierarchy gains an extra level (turn → invocation → nodes). Wrapper is OPTIONAL — harnesses MAY skip it if the runtime already provides a transport-level parent span (e.g., an OTel-instrumented FastAPI's request span). Span name + attributes are harness-implementation-defined; turn-level attributes follow §5.6 (openarmature.session_id in sessioned mode) and §5.8 (suspension descriptor attributes on signal-resume turns).
sessions §3 Identity scoping — Harness threading paragraph. Cross-reference noting that deployments wrapping the engine via a harness (per the harness capability spec) are responsible for resolving session_id from inbound traffic and threading it into every invoke() call — in sessioned mode. Stateless-mode harnesses never thread a session_id. Documentary tightening; no behavior change at the sessions layer (the omit-and-skip rule was already normative).
suspension §8.7 Deployment runtime / harness contract paragraph tightened. Was a forward-looking placeholder ("formalized by the harness capability when its spec lands"); now points concretely into the harness capability spec's §6 Signal coordinator, §3.3 Signal-resume inbound dispatch path, §5.3 Suspended outcome handling (harness MUST NOT block on suspended turns), and §7 Error categorization at the turn boundary. Suspension itself stays runtime-neutral.
Eleven new conformance fixtures under spec/harness/conformance/: 001-inbound-new-session (§3.1), 002-inbound-existing-session (§3.2 — verifies caller-supplied id threading + engine load on second invoke), 003-inbound-signal-resume (§3.3 — suspend then resume threading signal_payload), 004-outcome-completed (§5.1), 005-outcome-errored (§5.2 — categorized into a §7 bucket), 006-outcome-suspended (§5.3 — four-step requirement; load-bearing no-block rule), 007-signal-coordinator-roundtrip (§6 — full end-to-end suspend → subscribe → signal arrival → resume), 008-error-session-terminating (§7.1), 009-error-retryable-transient (§7.2), 010-error-user-correctable (§7.3), 011-stateless-mode-no-session (§3.0 + §8.1 — stateless harness invokes without session_id; SessionStore receives ZERO load + save calls even when registered).

Notes

MINOR bump (pre-1.0). Additive primitive in a new capability. Existing applications that do not use a harness see no behavioral change (the harness contract describes how deployment runtimes invoke the engine; the engine itself stays unchanged). Existing harness-shaped code (Inngest handlers, FastAPI wrappers, chat loops) becomes conformance-checkable against this spec — the contract ratifies what production harnesses already do and gives cross-language implementations a single behavioral floor. Composes with sessions (proposal 0020), suspension (proposal 0021), and checkpointing (proposal 0008) into the full stateless-worker shape per suspension §1's Stateless workers as the architectural consequence paragraph — harness is the fourth piece (dispatch logic that routes inbound signals to any available worker). The chat harness sub-spec is committed to a follow-on proposal (proposal 0056, drafted 2026-06-05) per the Open questions resolution in proposal 0022; FastAPI, Inngest, CLI, and other per-harness-type sub-specs land per-case when cross-impl divergence risk warrants.

[0.48.0] — 2026-06-04¶

Added

New capability: conformance-adapter. Ratifies the language-agnostic conformance fixture system that every OpenArmature implementation builds against — the YAML schema fixtures use, the directive vocabulary they draw from, the harness primitives implementations MUST provide, the assertion shapes adapters MUST honor, and the responsibility model for language-specific adapters that translate fixtures into host-runtime tests. v1 is descriptive of the system that already exists as of v0.47.0 — no fixture YAML files change, no directive shapes change, no existing implementation behavior changes. The capability documents what has accreted across 20+ prior proposals since proposal 0001 introduced the first fixtures, so future proposals that add new directives extend spec/conformance-adapter/spec.md §5 Directive vocabulary the same way new pipeline-utilities §6 middleware extend pipeline-utilities. Spec sections: §1 Purpose, §2 Concepts (Fixture, Adapter, Directive, Harness primitive, Assertion shape, Invariant, Case, Invocation), §3 Fixture file format (NNN-name.{yaml,md} pair convention, per-directory harness notes via fixture-header comments per the observability fixture pattern), §4 Fixture YAML schema (single-case, multi-case cases:, multi-invocation invocations: forms; optional conformance_version: pin), §5 Directive vocabulary (§5.1 node behavior, §5.2 state/schema, §5.3 edges, §5.4 composition, §5.5 observers, §5.6 persistence, §5.7 invocation shape, §5.8 expected outcomes, §5.9 invariants), §6 Harness primitives (in-memory observers, persistence backends, OTel + Langfuse mocks, suspend/resume + drain + middleware wiring — all real not simulated), §7 Nondeterminism handling (observer_event_invariants: for fan-out / parallel-branches / observer-dispatch interleaving cases), §8 Adapter responsibility, §9 Errors (fixture_directive_unknown, fixture_schema_invalid, fixture_version_unsupported, harness_primitive_missing — MUST raise rather than silently skip), §10 Determinism, §11 Cross-spec touchpoints, §12 Out of scope. (proposal 0055)
docs/conformance.md (new). Readable explainer / tutorial alongside docs/governance.md and docs/openarmature.md. Covers the shape of the system, a concrete worked example, the directive vocabulary's category breakdown, the harness primitive requirements, the nondeterminism handling pattern, the end-to-end flow of a fixture run, how a new language implementation ships a conformance-passing adapter, and how the vocabulary grows over time. Distinct from the normative capability spec — the spec is the contract; the docs page is the mental model.
docs/governance.md §"Conformance tests" — Normative reference paragraph. Cross-references the new capability spec as the authoritative source for the fixture YAML schema, directive vocabulary, harness primitives, and assertion shapes (plain-text path mentions, not markdown links — the symlinked docs/governance.md → GOVERNANCE.md resolves under both repo-root and docs-rendered views, but no single relative-link path works under both validators; readers find clickable cross-refs via the capability spec's appearance in the README capability table and the mkdocs Capabilities nav). The §"Conformance tests" overview prose stays; the new paragraph appended points readers at spec/conformance-adapter/spec.md for directive details and docs/conformance.md for the explainer.

Changed

spec/graph-engine/conformance/README.md slimmed. The original v0 informal YAML schema content moved into spec/conformance-adapter/spec.md §4. The README is now a one-paragraph breadcrumb pointing at the conformance-adapter capability spec as the authoritative reference. Per-capability conformance directory READMEs are now optional navigational aids, not the schema source of truth.

Notes

MINOR bump (pre-1.0). Pure documentary addition + slim of one existing README — no fixture YAML files modified, no directive shapes changed, no implementation behavior changed. Every existing fixture continues to pass unchanged under any conforming adapter. Implementations adopt the new capability by bumping their spec-version pin to v0.48.0; the adapter behavior they already ship is what the capability spec ratifies. Future proposals that add new directives now have a normative home for the additions — the capability spec's §5 Directive vocabulary gains entries the same way other capabilities gain attribute / hook / middleware additions. The "ratifying not redesigning" disposition is deliberate (per Alternatives considered §3 in the proposal): re-approving every existing fixture against a redesigned schema would be significant churn for marginal cleanliness gain. A follow-on cleanup proposal MAY consolidate overlapping directives once the v0.48.0 surface stabilizes.

[0.47.0] — 2026-06-03¶

Added

New capability: suspension. Introduces suspend(descriptor, mark_node_completed=True) as a node-side operation that intentionally pauses an invocation, persists its state to a durable store under a typed signal descriptor (carrying a caller-supplied signal_id plus optional application-typed metadata), and returns control to the caller with a structured suspended outcome (outcome="suspended", state, descriptor, node_name, plus invocation_id / correlation_id). Resume via invoke(resume_invocation=<id>, signal_payload=<payload>) loads the paused record, validates status, merges the payload via shallow field overlay (reducers NOT consulted — signal payload represents authoritative external data), and continues from the node after the suspending node (under default mark_node_completed=True) or re-invokes the suspending node body (under opt-in mark_node_completed=False; node body must be re-entrant). Generalizes human-in-the-loop, long-running async work, scheduled wakeups, and external-event-await as flavors of one primitive whose semantics are shared. The load-bearing architectural consequence is stateless workers — between a suspend on machine A and a resume on machine B, nothing about the invocation lives in any worker's memory; the paused-invocation record is in shared durable storage. Compositions covered: subgraphs (suspension propagates), fan-out (entire fan-out node suspends; siblings cancelled; descriptor carries fan_out_index; incompatible with error_policy="collect"), parallel-branches (entire dispatcher suspends; siblings cancelled; descriptor carries branch_name; incompatible with collect), middleware (MAY wrap a suspending node — post-next() block skipped on the suspending attempt; middleware itself MUST NOT call suspend()), checkpointing (shared persistence mechanism via the new pipeline-utilities §10.15 composition rule), sessions (atomic-suspend: session save MUST succeed alongside the paused-invocation record write or the invocation errors). Error categories: suspension_persistence_failed, suspension_record_invalid, suspension_resume_payload_invalid, suspension_in_unsupported_context. (proposal 0021)
graph-engine §3 Execution model — Invocation outcomes paragraph (new). Classifies invoke()'s three return categories — completed (the graph reached END), errored (a node raised per §4), and suspended (a node body called suspend() per suspension §3; engine persisted a paused-invocation record and returned the suspended outcome per suspension §5 distinct from completion or error). The Invocation entry surface paragraph's invocation_id clause split into two cases — fresh id on checkpoint-resume per pipeline-utilities §10.4 (existing rule); reused id on suspension-resume per suspension §7 (new rule). The reused-id contract is load-bearing for cross-process resume: the runtime routes inbound signals to a paused record by invocation_id, and the resuming invocation's events scope to that same id.
graph-engine §6 Observer hooks — suspended phase NodeEvent. phase field enum extended with "suspended"; new descriptor field populated only on suspended events carrying the signal descriptor per suspension §4; per-attempt terminal-shape paragraph extended to make completed and suspended mutually exclusive at the terminal slot (each attempt produces exactly one started event followed by exactly one terminal event — completed OR suspended).
observability §4.2 Status mapping — SUSPENDED row + Suspended status mapping paragraph. Logical SUSPENDED status distinct from OK and ERROR; applied to both the suspending node's span and the invocation root span when a node calls suspend(). OTel physical mapping defined in the paragraph: status OK plus openarmature.outcome = "suspended" span attribute (since OTel's native status code field has only UNSET / OK / ERROR). Other backends MAY use a native suspended status if their data model supports one.
observability §4.3 Parent-child rules — Suspended-resume invocation spans paragraph (new). Cross-invocation-span correlation invariant for suspension-resume: the resume invocation span carries the same openarmature.invocation_id as the suspended one; OTel observers SHOULD additionally link via span-link or parent-of mechanisms. Explicitly distinguishes from checkpoint-resume per pipeline-utilities §10.4 (fresh invocation_id; correlated only via shared correlation_id per §3.1).
observability §5.8 Suspension span attributes (new). openarmature.suspension.signal_id (string; always present on a suspended node span; carries the descriptor's signal_id per suspension §4) and openarmature.suspension.metadata.* (flattened descriptor metadata fields per the OTel-attribute-compatible value-type contract from §3.4). Composition rules for detached trace mode (§4.4) included. Applies to the suspending node's span specifically; the invocation root span carries only the logical SUSPENDED status per §4.2.
pipeline-utilities §10.15 Composition with suspension (new). Paused-invocation records and checkpoint records share the persistence mechanism (single backend store with discriminator OR separate stores; implementation choice) but are distinct record shapes. Resume operations load the correct record type per the resume API in use (invoke(resume_invocation=...) per §10.4 → checkpoint record; invoke(resume_invocation=..., signal_payload=...) per suspension §7 → paused-invocation record). Paused-record lifetime is NOT bound to invocation completion (persists until resume completes / cancellation / backend retention). Error categories are distinct: suspension_persistence_failed does not signal checkpoint failure and vice versa.
Fifteen new conformance fixtures under spec/suspension/conformance/: 001-basic-suspend-resume (canonical suspend/resume cycle with mark_node_completed=True continuation), 002-suspend-payload-merge (shallow field overlay; reducers not consulted), 003-suspend-payload-invalid-schema (suspension_resume_payload_invalid on bad payload), 004-resume-invalid-record (suspension_record_invalid on non-suspended resume targets — completed invocation AND never-existed cases), 005-suspend-in-subgraph (propagation from inner subgraph node up to outer invocation), 006-suspend-in-fan-out-fail-fast (sibling cancellation; descriptor carries fan_out_index), 007-suspend-in-fan-out-collect-rejected (suspension_in_unsupported_context for collect-mode fan-out), 008-suspend-in-parallel-branches-fail-fast (sibling cancellation; descriptor carries branch_name), 009-suspend-in-parallel-branches-collect-rejected (suspension_in_unsupported_context for collect-mode parallel-branches), 010-suspend-observability-event (suspended NodeEvent with descriptor; no completed event for suspending node), 011-suspend-span-status (OTel SUSPENDED mapping + §5.8 suspension attributes on the node span), 012-suspend-with-sessions (session save at suspend; resume sees consistent state), 013-suspend-with-checkpointer (paused-invocation record persists via configured checkpointer backend; distinct from checkpoint records via record-type discriminator), 014-suspend-wrapped-by-middleware (middleware pre-next() runs; post-next() skipped on suspending attempt), 015-suspend-in-middleware-rejected (suspension_in_unsupported_context when middleware itself calls suspend()).

Notes

MINOR bump (pre-1.0). Additive primitive in a new capability. Existing applications that do not call suspend() see no behavioral change. Pipelines opting into suspension gain the cross-process pause/resume primitive that makes stateless-worker deployment shapes possible. Composes with sessions (proposal 0020), checkpointing (proposal 0008), and the harness contract (proposal 0022, planned next) into the full stateless-worker shape — take any of those four away and the pattern degrades. Backwards-compatible at the spec level; implementations adopt by adding the suspend() operation to the engine surface, extending the NodeEvent phase enum with "suspended", and persisting paused-invocation records via the same machinery they already use for checkpoint records.

[0.46.0] — 2026-06-03¶

Added

graph-engine §6 Observer hooks — drain_events_for(invocation_id, *, timeout) per-invocation drain primitive (new). Sibling to the existing process-wide drain — scopes the wait to observer events tagged with a single invocation_id rather than blocking on every active invocation across the graph. Returns the same summary shape as drain (undelivered_count, timeout_reached; implementations MAY add richer detail). The snapshot semantic from the existing drain carries over verbatim: the set of events covered is fixed at the moment the call begins; events emitted with the matching invocation_id after the call do NOT block (otherwise a caller running inside an active invocation would spin indefinitely on its own node body's completed event). Events scope via the invocation_id defined in observability §5.1; implementations MUST tag every observer event with the invocation_id of the invocation that emitted it. Detached subgraphs and detached fan-outs (per observability §4.4) inherit the parent invocation's identifier and ARE covered by the parent's drain. Timeout discipline follows the existing drain — non-negative duration in seconds, MUST-reject on negative / NaN at the API boundary, idiomatic per-language error type. The load-bearing divergence from drain: workers MUST NOT be cancelled on per-invocation drain timeout (in contrast to drain's shutdown-cancel rule) — the deliver loop continues processing the queue after the timeout because the graph remains active and other invocations may still be in flight. Calling drain_events_for on an invocation whose events have all been delivered MUST return immediately with undelivered_count == 0 and timeout_reached == false (the common case in production). Composition with resume (per the resume-mints-fresh-id rule in graph-engine §3 Invocation entry surface): a resumed invocation mints a fresh invocation_id; the drain scopes to the resumed invocation's events only. The cross-reference paragraph appended to the existing §6 Drain block directs readers toward the right primitive for in-invocation synchronization (the new per-invocation drain) versus lifespan / shutdown coordination (the process-wide drain). Resolves the synchronization race the queryable observer pattern (proposal 0048) exposes when a terminal node reads accumulator state mid-invocation. (proposal 0054)
Six new conformance fixtures: graph-engine/conformance/028-drain-events-for-basic-synchronization (terminal node calls drain then reads accumulator; pre-drain events present, no race); 029-drain-events-for-snapshot-semantic (drain returns without blocking on the caller's own completed event — the load-bearing snapshot rule; pairs with 028 to pin the semantic from both sides); 030-drain-events-for-timeout (tight timeout against slow observer → timeout_reached: true, non-zero undelivered_count, graph remains usable for subsequent invocations — locks down the no-worker-cancellation divergence from drain); 031-drain-events-for-invocation-scope (two serial invocations; each terminal-node drain sees only its own invocation's events — pins the per-id scoping rule the resume composition depends on); 032-drain-events-for-fan-out-coverage (fan-out + downstream persist node calls drain; accumulator snapshot contains events from EVERY inner instance — locks down the rationale that rejected the per-node-scope alternative); 033-drain-events-for-parallel-branches-coverage (parallel-branches peer to 032; downstream persist node's drain covers events from EVERY branch via the shared parent invocation_id — locks down per-invocation drain coverage for the second concurrent-dispatch primitive, which exercises a different engine code path than fan-out).

Notes

MINOR bump (pre-1.0). Additive at the spec level — one new method on the compiled graph surface; the existing drain is unchanged. Existing applications that do not call drain_events_for see no behavioral change. Pipelines opting into the accumulator pattern (per the queryable observer pattern blessed in observability §9 / proposal 0048) gain a synchronization primitive that closes the race their terminal-node-reads-mid-invocation case had. Implementations satisfying the existing per-invocation event tagging contract (already required for the observability §3.4 contextvar propagation) can derive the per-invocation pending count from existing plumbing; no new event metadata required.

[0.45.0] — 2026-06-01¶

Changed

observability §3.4 Shared-parent boundary (MUST NOT) — paragraph rewritten with conditional invocation-span classification. The original prose framing (from proposal 0045) classified the invocation span as an unconditional shared parent "regardless of runtime cardinality" — but in the pure-serial case (no fan-out or parallel-branches dispatch on the augmenter's call-stack path), the invocation span has no sibling instances to leak to and is on the augmenter's call-stack ancestor path. The rewritten paragraph splits the classification into three bullets: fan-out node always a shared parent (degenerate single-instance cases included — structural classification governs); parallel-branches node always a shared parent (same rule); invocation span a shared parent only when at least one fan-out or parallel-branches dispatch is on the augmenter's call-stack path (predicate stated via the lineage chain having non-null fan_out_index or branch_name entries). Pure-serial augmentations reach the invocation span via rule 2 of the boundary decision tree; nested augmentations (inside any fan-out instance or parallel branch) do not reach the invocation span because at least one dispatcher is on the path. The decision tree's rule 3 gains a short parenthetical pointing readers at the conditional classification. (proposal 0053)

Notes

MINOR bump (pre-1.0). Documentary tightening only. The normative behavior is unchanged from what fixtures 034 (034-caller-metadata-open-span-update-serial — outermost-serial augmentation reaches the invocation span) and 039 (039-nested-lineage-augmentation — nested cases do not reach the invocation span) already exercise. The proposal closes the spec-text-vs-fixture ambiguity that previously made fixtures 034 and 039's behavior unreconcilable from §3.4's text alone — both pass under the predicate-derived reading; the tightened spec text retroactively records the predicate. No conformance fixture changes needed; implementations that pass 034 and 039 today already implement the predicate-derived behavior. Matches the Textual impl-tracking status precedent from proposals 0019 / 0026 / 0030 / 0051 — implementations adopt by bumping their spec-version pin without code changes.

[0.44.0] — 2026-06-01¶

Added

observability §5.1 — two new invocation-level attributes. openarmature.implementation.name (string; canonical values "openarmature-python" / "openarmature-typescript" / "openarmature-<language>" matching the language's package-registry shape) and openarmature.implementation.version (string; sourced from the implementation library's package metadata in the language-idiomatic way — openarmature.__version__ for Python, package.json version for TypeScript). Both attributes are implementation-emitted (never caller-supplied; reserved per §3.4) and emit on every invocation span. The values answer the operator triage question "which library version produced this trace" that spec_version alone doesn't answer — operators copying the implementation name directly into PyPI / npm search lands on the right package without transliteration. (proposal 0052)
observability §5.1 — Always-emit invariant paragraph (new). Frames openarmature.implementation.name + openarmature.implementation.version + the existing openarmature.graph.spec_version and openarmature.correlation_id (§3.1 / §5.6) as runtime-identity constants that MUST emit regardless of disable_state_payload / disable_llm_payload / any other observer-level privacy knob. Privacy knobs gate runtime data (caller state, LLM messages), not runtime identity. The §8.4.1 Langfuse-mapping rows derived from these attributes inherit the same invariant.
observability §8.4.1 — two new Trace metadata rows. openarmature.implementation.name → trace.metadata.implementation_name; openarmature.implementation.version → trace.metadata.implementation_version. The rows source from the §5.1 attributes (parallel to the existing spec_version mapping row); Langfuse-side projection emits on every Trace.
observability §3.4 — reserved-key set extends 24 → 26 names. implementation_name and implementation_version join the reserved set; a caller-supplied colliding key is rejected at the invoke() API boundary per the enforcement mechanism established by proposals 0041 / 0042.
Two new positive-control conformance fixtures: observability/conformance/058-implementation-attribution-otel (dual case — invocation span carries both attributes as non-empty strings matching the per-language canonical name with the inner node span NOT carrying them; PLUS a detached-subgraph case asserting both the parent invocation span AND the detached child trace's invocation span carry the attributes per §5.1's always-emit invariant applying to every invocation span) and 059-implementation-attribution-langfuse (Trace metadata carries the two rows; both default and disable_state_payload = False configurations emit the rows — confirms the always-emit invariant on the Langfuse side). Existing fixture 028-caller-metadata-namespace-rejection extended with two negative-control cases (rejects_reserved_oa_name_implementation_name, rejects_reserved_oa_name_implementation_version) asserting the §3.4 reservation rejects caller-supplied collisions.

Notes

MINOR bump (pre-1.0). Additive across observability: two new §5.1 attributes (each two-string emission on the invocation span), two new §8.4.1 Trace metadata rows (Langfuse-side projection), one §3.4 reserved-set extension (24 → 26 names; enforcement mechanism unchanged), one new informative Always-emit invariant paragraph in §5.1. Existing fixtures unchanged (OTel-side fixtures don't assert absence of new attributes; Langfuse-side fixtures don't assert absence of new metadata keys). Backwards-compatible at the spec level; OTel-consuming backends that ignore unknown attributes see no behavioral change. Implementations adopt by sourcing their own package metadata at runtime initialization (__version__ for Python, package.json version for TypeScript).

[0.43.0] — 2026-06-01¶

Added

observability §8.4.1 — Implementation surface caveat (new paragraph). Records that the vendor SDK method delivering the §8.4.1 Trace input/output sourcing contract's UI-visible projection is marked deprecated by the upstream vendor. As of Langfuse SDK v4 (empirically verified 2026-05-31), this is the set_current_trace_io / Span.set_trace_io family, with stated removal in a future major version; the non-deprecated propagate_attributes method does not currently project trace-level input / output values to the Langfuse UI's headline columns. The §8.4.1 normative contract (three-lever decision tree, hook contract, status enum, resume semantics) is independent of which SDK method populates the values and remains stable across SDK migrations. Cross-references docs/compatibility.md per the External-dependency adoption policy as the operational tracking record. (proposal 0051)

Notes

MINOR bump (pre-1.0). Pure documentary addition; no behavior change, no public-type / interface changes, no conformance fixture impact. The §8.4.1 contract is unchanged; this caveat records the SDK-surface state at a verified date so future readers see the deprecation context discoverable from spec text rather than buried in implementation-side release notes. Backwards-compatible at the spec level; implementations adopt by bumping their spec-version pin without code changes (matches the Textual impl-tracking status precedent from proposals 0019 / 0026 / 0030). When the vendor publishes a concrete v5 migration guide, a follow-on proposal MAY expand the caveat into a full §8.4.1 reframe specifying the new binding.

[0.42.0] — 2026-06-01¶

Added

pipeline-utilities §6.3 — Failure isolation (new third bundled middleware). Catches exceptions escaping the inner chain and returns a configured degraded partial update. Configuration record: degraded_update (required; static mapping OR callable (state) -> partial_update), event_name (required no default — naming decision at the construction site), predicate (optional single-argument (exception) -> bool; defaults to always-true), on_caught (optional async callback). Catches Exception by default; BaseException propagates uncaught (matches §6.1 retry's cancellation-propagation rule). On catch, the middleware dispatches a framework-emitted failure-isolation event onto the observer delivery queue (parallels proposal 0040's metadata-augmentation event mechanism — distinct from NodeEvent, NOT promoted to a typed variant on the observer event union for v1) carrying event_name, the wrapped node's lineage tuple (namespace, attempt_index, fan_out_index, branch_name), pre_state / post_state, and a caught_exception record (category + message). The engine continues edge resolution from the degraded return; it does NOT see the exception. (proposal 0050)
pipeline-utilities §6.3 — Three-piece composition pattern (with §6.1 retry). Composes outer FailureIsolationMiddleware + inner RetryMiddleware + transient-aware node body for "retry transients, give up gracefully on exhaustion or non-transient errors" workflows. Outer-to-inner ordering is load-bearing — retry MUST be inner (it sees raw transients first); failure isolation MUST be outer (it only sees what escapes retry). Reversing the order would let inner isolation catch transients before retry sees them, defeating retry's purpose entirely.
llm-provider §5 — complete() signature extended with optional retry kwarg. Accepts an instance of the pipeline-utilities §6.1 retry middleware configuration record (max_attempts / classifier / backoff / on_retry) or None / absent. Default None preserves the v0.4.0 no-retry behavior verbatim — transient errors per §7 raise to the caller without retry. The "does NOT retry" operation-semantics bullet is amended to note retry policy lives at the per-node layer (pipeline-utilities §6.1) OR the per-call layer (this kwarg per §7.1).
llm-provider §7.1 — Call-level retry (new sub-section under §7 Error semantics). In-call retry loop semantics: dispatch underlying request; on transient (per the §6.1 default classifier — provider_unavailable, provider_rate_limit, provider_model_not_loaded, plus carrier-spec-marked transients), wait backoff(attempt_index) and re-attempt; on max_attempts exhaustion, propagate the final error per the normal exception path; non-transient exceptions propagate immediately on first occurrence. Reuses the §6.1 framework-agnostic configuration record (bidirectional cross-spec dependency with §6.1 acceptable because the shared record is framework-agnostic). Cancellation signals MUST propagate uncaught (matches §6.1's rule). Per-attempt span emission: N attempts emit N LLM provider spans, all parented under the calling node's span, disambiguated by openarmature.llm.attempt_index per observability §5.5. Includes a Two-level retry lane separation table (per-call vs per-node) and a Common mistakes list — the multiplicative-budget pitfall (3 × 5 × 3 = 45 worst-case for stacked 3-attempt outer × 5-call chunked × 3-attempt per-call), inline try/except defeating per-attempt attribution, and classifier widening to mask real errors.
observability §5.5 — single-span framing amended to per-attempt. "MUST emit a span around each complete() call" → "one span per attempt under call-level retry per §7.1; one span per complete() call when retry is absent (the default — preserving the v0.16.0 single-span framing)". The amendment is required because call-level retry per llm-provider §7.1 produces N attempts inside a single complete() call; emitting one span per attempt is the observability shape backends expect for retry attribution.
observability §5.5 — openarmature.llm.attempt_index attribute (new). Int. The retry-attempt index for the LLM call, 0 is the first attempt and 0..N-1 covers the N spans produced by an N-attempt call-level retry. Emitted on every LLM provider span; defaults to 0 for a single-attempt call (preserving the single-span case verbatim). Paralleled with openarmature.node.attempt_index per §5.2 for node-level retry; the two attributes are independent. OA-namespace placement governed by the Stable-only upstream adoption policy — the OTel GenAI semconv does not currently expose a stable gen_ai.attempt_index equivalent; a follow-on proposal MAY mirror to gen_ai.* if upstream stabilizes such an attribute.
Ten new conformance fixtures: pipeline-utilities/conformance/058-failure-isolation-static-degraded (basic catch + static degraded); 059-failure-isolation-callable-degraded (callable form, pre-merge state passed); 060-failure-isolation-predicate-filtering (matching exception caught + non-matching propagates); 061-failure-isolation-retry-three-piece-composition (outer FailureIsolation + inner Retry, exhaustion catches after both retry attempts); 062-failure-isolation-on-caught-callback (optional on_caught async callback fires alongside the framework-emitted event); 063-failure-isolation-default-predicate-bare-exception (default always-true predicate catches a bare ValueError; caught_exception.category = null, message captures str(exc)); llm-provider/conformance/056-call-level-retry-transient (HTTP 503 on attempt 0 → success on attempt 1; two per-attempt LLM spans with distinct attempt_index); 057-call-level-retry-exhaustion (both attempts fail; final error propagates); 058-call-level-retry-non-transient-no-retry (non-transient provider_invalid_request propagates on attempt 0 with exactly one LLM span; retry loop does NOT iterate); observability/conformance/057-llm-attempt-index-single-attempt-default (single-attempt default — complete() without a retry kwarg emits exactly one LLM provider span carrying openarmature.llm.attempt_index = 0 alongside the baseline §5.5 attributes; locks down the single-span backwards-compat contract).

Notes

MINOR bump (pre-1.0). Additive across pipeline-utilities + llm-provider + observability. The two primitives ship together because their normative text shares a load-bearing two-level retry lane-separation framing as connective tissue (per the proposal's bundle justification). Existing pipelines that don't opt into either primitive see no behavioral change. The bidirectional cross-spec dependency between pipeline-utilities §6.1 and llm-provider §7.1 (each referencing the other) is acceptable per the proposal because the shared retry configuration record is framework-agnostic. Existing fixtures unchanged.

[0.41.0] — 2026-06-01¶

Added

graph-engine §6 — LlmCompletionEvent typed event variant on the observer event union. First spec-normatively-typed event variant alongside NodeEvent and the framework-emitted metadata-augmentation event from proposal 0040. Dispatched on every LLM call completion that produces a structured response (per llm-provider §6). Carries 13 typed fields: identity / scoping (invocation_id, correlation_id, node_name, namespace, attempt_index, fan_out_index, branch_name) and outcome data (provider, model, request_id, usage, latency_ms, finish_reason) plus an OPTIONAL caller_invocation_metadata opt-in snapshot field. Observers filter via type discrimination (isinstance(event, LlmCompletionEvent) or per-language equivalent) rather than via the impl-current sentinel-namespace string match. The class name LlmCompletionEvent is normative as an identifier shape; per-language casing / symbol conventions MAY differ. Not subject to the phases subscription filter (matches the metadata-augmentation event's no-phase treatment). Failure cases (provider exceptions, malformed responses) do NOT emit this event variant — failures surface through the llm-provider §7 exception path. (proposal 0049)
observability §5.5.7 — Typed LLM completion event (new sub-subsection). Frames the typed LlmCompletionEvent (defined on the graph-engine §6 observer event union) as the structured form of the §5.5 LLM provider span attribute surface — same identity / scoping / outcome data in structured-event shape rather than separate span attributes. Backwards compatibility for the impl-current sentinel-namespace convention (NodeEvent.node_name == "openarmature.llm.complete" — a common implementation convention NOT pinned as a spec NodeEvent shape; the same string is the OTel span name per §5 Span names but the NodeEvent shape using it is impl-current) is preserved via a SHOULD-emit-both transition: implementations that have historically emitted the sentinel NodeEvent SHOULD continue emitting it alongside the typed event for an implementation-defined transition window. Backends opting into the typed event SHOULD subscribe to one variant per LLM completion to avoid double-counting.
Seven new conformance fixtures under observability/conformance/: 050-llm-completion-event-dispatch (typed event fires with populated field set on a structured-response provider); 051-llm-completion-event-type-discrimination (type-discriminating observer receives the typed event regardless of whether the impl also emits the sentinel NodeEvent); 052-llm-completion-event-caller-metadata-opt-in (default caller_invocation_metadata = null; opt-in observer config populates with a snapshot); 053-llm-completion-event-no-event-on-failure (provider returns HTTP 503 → provider_unavailable per llm-provider §7; no typed event delivered, failure surfaces via the node's completed NodeEvent with error populated); 054-llm-completion-event-fan-out-index-population (fan-out over two instances; the two captured typed events carry distinct fan_out_index values covering {0, 1}, branch_name is null on both — locks down sibling-instance disambiguation); 055-llm-completion-event-branch-name-population (parallel-branches over two named branches; the two captured typed events carry distinct branch_name values covering {fast, slow}, fan_out_index is null on both — companion to 054 for the parallel-branches dispatch surface); 056-llm-completion-event-strict-serial-ordering (the typed event arrives between the LLM-calling node's started and completed NodeEvents in the observer's arrival sequence, per the strict-serial delivery guarantee and the spec's "after response received, before call returns" dispatch timing).

Notes

MINOR bump (pre-1.0). Additive across graph-engine + observability: a new typed event variant on the observer event union (alongside the existing NodeEvent shape and the framework-emitted metadata-augmentation event from proposal 0040), a new observability §5.5.7 sub-subsection framing the typed event as the structured form of the §5.5 attribute surface, plus seven new conformance fixtures. Existing fixtures unchanged. The change is backwards-compatible at the spec level — the typed event is purely additive; implementations that historically emit a sentinel-namespaced NodeEvent for LLM completions handle backwards compatibility internally per the §5.5.7 SHOULD-emit-both transition.

[0.40.0] — 2026-06-01¶

Added

observability §3.4 Read access paragraph block. New symmetric read primitive openarmature.observability.get_invocation_metadata(). Returns an immutable mapping snapshot of the metadata visible in the current async context, carrying the caller-supplied baseline plus any entries set via set_invocation_metadata in the current or ancestor contexts. Sibling-instance writes after fan-out are NOT visible (the contextvar's copy-on-write isolation applies symmetrically to reads); outermost-serial reads after a fan-out joins see only the pre-fan-out baseline. Under retry middleware, each attempt's reads see only that attempt's writes plus the pre-attempt baseline — prior failed attempts' writes do NOT carry over. Calls outside an active invocation return an empty immutable mapping (silent no-op). Reads do NOT emit a metadata-augmentation event. Return type is an immutable mapping shape (Python MappingProxyType / TypeScript Readonly<Record<string, AttributeValue>> or equivalent); typed wrappers deferred. (proposal 0048)
observability §9 — Queryable observer pattern (new section, renumbers existing §9 Determinism → §10 and §10 Out of scope → §11). Normative convention for concrete observer types exposing read methods on the instance attached to a graph, consumable by pipeline nodes holding a reference. The Observer protocol surface (graph-engine §6) is unchanged — the pattern is a convention for how concrete implementations expose read-augmenting state to the pipeline.
observability §9.1 — Read-method contract. Read methods MUST be query-only (no graph state mutation), MUST NOT influence routing or node dispatch, MUST NOT emit events to other observers, and SHOULD be non-blocking from the event-loop perspective. Queryable observers are a read-augmenting convenience, NOT a replacement for State.
observability §9.2 — Async-safety contract. Read methods MAY race with concurrent event emission to the same observer. Implementations MUST ensure read-consistency (no torn views) but MUST NOT guarantee event-count completeness up to a wall-clock instant. Consumers needing post-completion stability gate on the invocation's completion signal (the strictly-serial delivery queue guarantees prior events arrive before the terminal event reaches the observer).
observability §9.3 — Three-channel data-access guidance (table). Compares State (typed schema with declared reducers; canonical mutable data plane), invocation-metadata (untyped per-invocation cross-cutting key/value; per-async-context scoped), and queryable observer accumulator (derived summary state on a concrete observer instance) — three distinct read surfaces with different lifetimes and use cases. Default: prefer State; invocation metadata and queryable observer accumulators are narrow carve-outs.
observability §9.4 — Lifecycle. Accumulating queryable observers MUST NOT auto-drop accumulated state on the invocation's completion signal (would race against end-of-invocation reads). Concrete accumulating observers MUST provide an explicit drop / cleanup mechanism; the consuming node calls drop after reading. Long-lived accumulators across invocations are permitted but require manual cleanup discipline; the spec does NOT mandate a maximum retention policy.
Seven new conformance fixtures under observability/conformance/: 043-get-invocation-metadata-roundtrip (basic write-then-read in a single context returns baseline + in-node write); 044-get-invocation-metadata-fan-out-scoping (per-instance read isolation + outermost-serial sees pre-fan-out baseline); 045-get-invocation-metadata-retry-scoping (prior failed attempt's writes discarded); 046-get-invocation-metadata-outside-invocation (top-level call returns empty mapping, no exception); 047-queryable-observer-pattern (end-to-end attach / emit / consume cycle; downstream node reads observer's get_count() mid-invocation, asserting the count is bounded by events-emitted-before-read with no strict-equality guarantee per §9.2's no-wall-clock-completeness rule); 048-queryable-observer-async-safety (informative; concurrent reads return internally-consistent snapshots without enforcing event-count completeness); 049-queryable-observer-lifecycle-drop (long-lived accumulator across two sequential invocations: per-invocation_id bucket isolation + no auto-drop on completion signal + explicit drop(invocation_id) removes the bucket).

Notes

MINOR bump (pre-1.0). Additive across observability: new §3.4 read paragraph is symmetric to the existing write API and reuses the existing contextvar / COW machinery from proposal 0034. New §9 Queryable observer pattern blesses an existing widely-used pattern without changing the Observer protocol surface — concrete observers that already follow the pattern see no breaking change. The §9 / §10 renumber to §10 / §11 is internal to observability and does not affect cross-spec references (no external observability §9 / observability §10 references exist in the rest of the spec). Existing fixtures unchanged. Backwards-compatible.
Verification correction during Accept-phase. The proposal's draft text referred to MetadataAugmentationEvent and InvocationCompletedEvent as named event variants; verification against the spec found these are not named typed events in the current spec (the spec uses prose "metadata-augmentation event" and node-level "completed event" conventions). Proposal text and spec text both use the spec's prose form.

[0.39.0] — 2026-06-01¶

Added

llm-provider §8 framing — Intra-impl wire-byte stability paragraph. New normative rule: a §8.X wire-format mapping implementation MUST produce byte-identical wire output for OA inputs that are equivalent up to construction-side insertion order. The rule applies to JSON object keys (sorted lexicographically at every nesting level), recursive JSON Schema canonicalization (tool parameters schemas), undeclared RuntimeConfig extras, and content-block source dicts. Cross-implementation byte equality (e.g., Python and TypeScript producing identical bytes for the same input) remains non-normative per the existing §5.5.1 caveat; the new rule is intra-impl only. Per-mapping Wire-byte stability sub-paragraphs added to §8.1.1 / §8.2.1 / §8.3.1 anchoring the rule to that mapping's specifics. (proposal 0047)
llm-provider §6 Response.usage — two new optional fields. cached_tokens? (count of input tokens that hit a prefix cache, reported by the provider) and cache_creation_tokens? (count of input tokens written to the cache during the call, populated primarily by providers with explicit cache-control surfaces). Both fields use the absent / null semantics for "provider did not report" and 0 for "provider reported zero" — the distinction is observable per spec. Each §8.X wire-format mapping documents the source field per the per-mapping rows below.
llm-provider §8.1.2 (OpenAI-compatible) — cache-stat source rows. usage.cached_tokens ← usage.prompt_tokens_details.cached_tokens (OpenAI Chat Completions wire shape, also followed by vLLM and other OpenAI-compatible servers). The newer OpenAI Responses API surfaces the same value at usage.input_tokens_details.cached_tokens; implementations targeting that endpoint source from the input_tokens_details path with identical semantics. vLLM caveat: vLLM servers require both --enable-prefix-caching and --enable-prompt-tokens-details for the cache field to populate. usage.cache_creation_tokens is left absent — OpenAI's prompt-cache surface does not report a discrete cache-creation count.
llm-provider §8.2.2 (Anthropic) — implicit-not-supported caveat. Anthropic does NOT support implicit prefix caching; the response fields cache_read_input_tokens / cache_creation_input_tokens only fire under explicit cache_control blocks, which is an explicit-cache surface out of scope for the §6 implicit-cache fields. The §8.2 mapping leaves both implicit-cache fields absent. Anthropic's explicit-cache values remain visible via Response.raw.usage.cache_read_input_tokens / cache_creation_input_tokens for callers that need them; a future proposal could add spec-level explicit-cache primitives to surface these on a dedicated explicit-cache surface.
llm-provider §8.3.2 (Google Gemini) — cache-stat source row. usage.cached_tokens ← Gemini's usageMetadata.cachedContentTokenCount (Gemini 2.5+ surfaces this for both implicit cache hits and explicit-cache reads under the same field; the implicit semantics fit the §6 contract). usage.cache_creation_tokens is left absent — Gemini does not report a discrete cache-creation count under its implicit-cache surface.
prompt-management §13 — Cross-variable substring stability paragraph. New normative rule: variable substitution MUST be in-place and MUST NOT introduce position-dependent transformations (variable-index numbering, per-variable salts, whole-template-state-dependent normalization) that would shift bytes earlier in the rendered output based on later content. Two renders sharing a common prefix of variable values MUST produce rendered output whose corresponding region is byte-identical even when later variables differ. This is the substrate downstream automatic prefix caching relies on.
prompt-management §14 — APC-friendly authoring guidance (new subsection, informative). Non-normative authoring patterns for maximizing automatic prefix cache hit rates: pin the high-cardinality stable region at the start of the prompt; avoid front-loading per-call variables in shared regions; keep variable substitution in-place; stabilize multi-value formatting (sorted-key JSON, source-order iteration); prefer the multi-message chat_template shape's natural turn boundaries. Existing §14 Out of scope renumbers to §15.
observability §5.5.3.1 — OA-namespaced cache attributes (stable-only mirror) (new sub-subsection). Two new attributes on the LLM provider span: openarmature.llm.cache_read.input_tokens (sourced from §6 Response.usage.cached_tokens; emitted only when the §6 field is populated) and optional openarmature.llm.cache_creation.input_tokens (sourced from §6 Response.usage.cache_creation_tokens; emitted only when populated). The OA-namespace placement reflects the Stable-only upstream adoption policy (GOVERNANCE.md, docs/compatibility.md): the upstream OTel attributes gen_ai.usage.cache_read.input_tokens / cache_creation.input_tokens are at Development status as of OTel semconv v1.41.1, so OA mirrors to its own namespace until upstream stabilization. Emission honors the existing disable_genai_semconv opt-out (§5.5.4).
Six new conformance fixtures: llm-provider/conformance/054-openai-wire-byte-stability (two structurally-equivalent calls with different insertion orders produce byte-identical OpenAI wire bodies); llm-provider/conformance/055-anthropic-wire-byte-stability (same shape against §8.2); prompt-management/conformance/032-cross-variable-substring-stability (two renders with different unrelated variables share byte-identical prefix); observability/conformance/040-llm-cache-attribute-emission (provider response carries prompt_tokens_details.cached_tokens → span carries openarmature.llm.cache_read.input_tokens); observability/conformance/041-llm-cache-attribute-absence (provider response without cache field → cache attributes absent); observability/conformance/042-llm-cache-attribute-reported-zero (provider response carries cache field with value 0 → §6 Response.usage.cached_tokens is 0 not null, span emits attribute with value 0, locks down the absent-vs-zero distinction).

Notes

MINOR bump (pre-1.0). Additive across all three capabilities: a new normative wire-byte stability rule, two new optional usage fields, per-mapping rows that augment existing tables without renaming, a new prompt-management §13 paragraph that tightens an existing contract without breaking conformant implementations, a new informative §14, and two new conditionally-emitted observability attributes. Existing fixtures unchanged. The wire-byte stability rule is implementation-natural for any adapter constructing wire output deterministically; the cross-variable substring stability rule is implementation-natural for any pure-substitution template engine. Implementations adopt by sourcing the new fields and emitting the new attributes when the provider response supplies the data — no breaking change for existing callers.

[0.38.0] — 2026-05-30¶

Added

prompt-management §3.1 Chat-prompt variant — new subsection. A Prompt is now one of two variants: the existing Text-prompt (template: <template representation>, renders to a single text Message — unchanged at the data-model level) or the new Chat-prompt (chat_template: list[ChatSegment] in place of template, renders to a multi-message PromptResult). ChatSegments are either content segments ({role: "system"|"user"|"assistant", content: <text-template OR content-blocks-template>}) or placeholder segments ({placeholder: str}). Content segments support either a single text template (the common case) or a non-empty content-blocks template mirroring llm-provider §3.1 ContentBlock shapes (text block, image-URL block, image-inline block) for authoring multimodal user messages. Image blocks are user-only per llm-provider §3.1.2 — a content-blocks segment containing any image block MUST have role: "user". Placeholder segments inject a caller-supplied list[Message] at the slot position; placeholder names MUST match the regex [A-Za-z_][A-Za-z0-9_]* (ASCII-identifier shape, pinned for cross-impl portability) and MUST be unique within a chat_template. Chat-prompt template_hash is computed over a canonical serialization of chat_template including segment order, kind, role + content (and for content-blocks segments the full block sequence), and placeholder names. (proposal 0046)
prompt-management §6.render — placeholders parameter + Chat-prompt render contract. render(prompt, variables=None, placeholders=None). For Chat prompts: text-template content segments render to a Message with text content; content-blocks segments render to a Message with a rendered ContentBlock sequence (per-block variable substitution into text-block text, image-block url, image-inline base64_data / media_type); placeholder segments inject the caller-supplied list at the slot position as standalone Messages. Empty injected lists are valid (zero messages contributed — the chat-history "no prior turns" case). For Text-prompts, placeholders MUST be ignored — implementations MUST NOT raise on a non-empty placeholders mapping passed alongside a Text prompt (pinned for cross-impl portability, enables generic wrappers passing placeholders unconditionally across variants).
prompt-management §8 — per-segment + per-block strict-undefined paragraph. Clarifies that for Chat prompts, strict-undefined applies independently per segment and (within a content-blocks segment) per block. A missing variable in any segment / block raises prompt_render_error and aborts the render before producing a partial PromptResult.
prompt-management §11 — Chat-prompt-specific prompt_render_error triggers. Under the same prompt_render_error category, the spec now enumerates: empty rendered text — pinned to literally zero characters after variable substitution (no whitespace stripping permitted; cross-impl portability) for both text-template content segments and {type: "text"} blocks within content-blocks segments; content-blocks segment with empty block list; unfilled placeholder slot (distinct from placeholders[<name>] = [] which is valid empty injection); duplicate placeholder names within a chat_template OR a placeholder name not matching the §3.1 identifier regex; role-block compatibility violation (image block in a non-user content-blocks segment, surfacing llm-provider §3.1.2's user-only constraint at the prompt boundary; render-time enforcement is spec-normative, construction-time detection MAY supplement); empty final rendered messages sequence (e.g., a chat_template containing only placeholder segments that all inject empty lists — preserves the §4 non-empty PromptResult.messages invariant; the per-placeholder empty-list rule remains valid for partial cases).
prompt-management §5 — variant note on PromptBackend.fetch. Signature unchanged; the returned Prompt MAY be either variant; backends SHOULD document which variants they emit.
prompt-management §12 — observability §8.4.4 unaffected-by-variant confirmation. The Langfuse Prompt-entity linkage is keyed on prompt identity (name + version + label), not on Prompt variant or rendered message count; Chat-prompts with observability_entities['langfuse_prompt'] flow through §8.4.4 exactly as Text-prompts do.
Fifteen new conformance fixtures (prompt-management/conformance/017-chat-prompt-per-segment-render through 031-text-prompt-placeholders-ignored) covering per-segment render, placeholder injection (non-empty + empty-list valid), per-segment strict-undefined, empty-segment error, unfilled-placeholder error, content-blocks render (text + image-URL; inline image), role-block compatibility rejection, observability linkage on a chat-shape prompt, empty-rendered-messages rejection (single + multi-placeholder cases), duplicate-placeholder-name rejection, content-blocks empty-cases (empty rendered text block + empty block list), placeholder-name-regex validation (leading-digit / disallowed-character / positive-control), and Text-prompt-ignores-placeholders.

Changed

prompt-management §6.render — Text-prompt render contract narrowed. The previously-vague clause "templates MAY produce multiple messages — e.g., a system + user split — when the template language supports it" is REPLACED by: "A Text-prompt renders to exactly one Message with role: 'user' and content equal to the rendered template text; multi-message and multimodal prompts MUST use the Chat-prompt variant (chat_template)." No current backend or implementation produces multi-message Text-prompt output (the prior clause was never operationalized by a normative mechanism); existing Text-prompt fixtures all exercise the single-Message behavior. The narrowing makes the Text-prompt vs Chat-prompt lanes explicit and removes ambiguity about where multi-message / multimodal rendering belongs.

Notes

MINOR bump. Additive at the data model (new Chat-prompt variant; new content-blocks alternative on chat content segments; new placeholders parameter on render). The §6.render Text-prompt narrowing is technically a narrowing of a previously-vague contract, but practical breakage risk is zero (no current backend or implementation uses the multi-message Text-prompt path). Backwards-compatible for callers using the Text-prompt path. Chat-prompt callers opt in by checking the returned variant and (if applicable) supplying a placeholders mapping at render time or authoring content-blocks segments for multimodal user messages.

[0.37.0] — 2026-05-30¶

Changed

observability §3.4 — Mid-invocation augmentation ancestor / sibling boundary rewritten as a lineage-aware three-rule structure.* The previous single-rule "ancestor / sibling boundary (MUST NOT)" — which conflated dispatch ancestors with shared parents — is replaced with three lineage-aware rules: Augmenter's call-stack ancestor chain (MUST) (every strict dispatch ancestor on the augmenter's specific call-stack path — outer fan-out instance, outer parallel-branches branch, outer serial-subgraph wrapper — gets the update); Sibling boundary (MUST NOT) (siblings at any dispatch depth do not); Shared-parent boundary (MUST NOT)*** (the fan-out node, parallel-branches node, invocation span — visible to multiple sibling instances / branches — do not). Adds a three-step boundary decision tree applied per open span at augmentation time. (proposal 0045)
observability §3.4 — Per-async-context scoping gains a follow-up Per-depth lineage tracking paragraph. Implementations MUST preserve the dispatch-context lineage as a list (one entry per dispatch depth: outer fan-out instances, outer parallel-branches branches, outer serial-subgraph wrappers on the augmenter's path), not a single scalar identifier that gets clobbered at each nested descent. When an augmentation fires at a leaf, the observer uses the lineage to locate the open ancestor dispatch spans on the augmenter's path.

Added

Conformance fixture observability/conformance/039-nested-lineage-augmentation exercising three nested-dispatch cases: inner fan-out inside outer fan-out instance, parallel-branches inside fan-out instance, and fan-out inside serial subgraph. Each case asserts the augmenter's full call-stack ancestor chain receives the augmentation, sibling instances / branches do not cross-pollinate, and shared parents (fan-out NODE, parallel-branches NODE, invocation) are not updated.

Notes

MINOR bump. The single-level behavior (one fan-out instance OR one parallel branch on the augmenter's path) is unchanged: the existing fixtures 029 / 030 / 034 exercise the call-stack-ancestor-chain-of-length-one case and their assertions remain correct under the lineage-aware rule. The loosening expands the set of spans that carry augmented metadata in nested cases — backwards-compatible for observers / backends that already handle the metadata (more spans now carry it; none get less).

[0.36.0] — 2026-05-29¶

Added

graph-engine §6 NodeEvent — parallel_branches_config field. Optional structured value populated on every started / completed event for a parallel-branches node, mirroring the existing fan_out_config field from proposal 0013. Carries branch_names (ordered branch identifiers), branch_count, error_policy ("fail_fast" or "collect" per pipeline-utilities §11.5), and parent_node_name. Surfaces the resolved parallel-branches configuration to the observability §5.7 attribute surface. (proposal 0044)
observability §5.7 — Parallel-branches span attributes (new subsection). openarmature.node.branch_name (a new OTel span attribute, paralleling openarmature.node.fan_out_index; appears on per-branch dispatch spans and on every inner-node span within a branch), openarmature.parallel_branches.parent_node_name (on per-branch dispatch spans), openarmature.parallel_branches.branch_count and openarmature.parallel_branches.error_policy (on the parallel-branches NODE span).
observability §4.3 + §6 — OTel parallel-branches dispatch span synthesis. §4.3 Parent-child rules gains a new bullet for the per-branch dispatch span (inner-branch spans parent under the synthesized dispatch span, not directly under the parallel-branches NODE span). §6 Driving span lifecycle widens the span-stack key from (namespace, attempt_index, fan_out_index) to (namespace, attempt_index, fan_out_index, branch_name) to disambiguate concurrent same-named inner spans across branches, and gains a Parallel-branches dispatch span synthesis sub-paragraph defining lazy per-branch dispatch span creation on the first inner event of each branch (keyed by the parallel-branches NODE's full event-source identity + branch) and close on the parent's completed in declaration order, children-before-parents.
Conformance fixture observability/conformance/038-otel-parallel-branches-dispatch-span asserts the OTel trace tree shape matches the Langfuse fixture 030 shape, plus the §5.7 attributes and the close-order invariants.

Notes

MINOR bump. Additive NodeEvent field; new §5.7 subsection; new OTel span attribute. The §4.3 + §6 updates change the OTel trace tree shape for invocations using parallel-branches (inner-branch spans previously parented directly under the parallel-branches NODE span now parent under a per-branch dispatch span). Downstream consumers hard-coding the previous nesting need to update. The span-stack-key widening to include branch_name aligns the spec's observer-driven example with the implementation-internal _StackKey widening that had been a workaround for the inner-span collision; combined with the dispatch-span synthesis, the workaround is no longer needed.

[0.35.0] — 2026-05-29¶

Added

observability §8.2 — Trace entity gains input / output payload fields. Documents existing Langfuse Trace fields surfaced as headline columns in the Langfuse Traces list view. (proposal 0043)
observability §8.4.1 — trace.input / trace.output mapping rows + Trace input/output sourcing paragraph. The paragraph defines a Langfuse-observer-level disable_state_payload privacy knob (default ON, symmetric to §5.5.4's disable_llm_payload), a three-lever source decision tree (caller hook → raw state when the knob is OFF → privacy-safe minimal stub by default), a closed {completed, failed} status enum on the minimal stub's trace.output, the caller-hook contract (trace_input_from_state / trace_output_from_state, raw-state input, null fallthrough), and resume semantics (each resume_invocation mints a fresh Langfuse trace per §8.4.1; hooks re-fire on the resumed trace; original trace not mutated).
Conformance fixture observability/conformance/037-langfuse-trace-input-output exercising the three-lever decision tree across four cases (including the lever-1 null-fallthrough case) + the resume case.

Notes

MINOR bump. Additive: §8.2 documents pre-existing Langfuse fields; §8.4.1's trace.input / trace.output fields were previously always-blank for OA-emitted traces. Callers using the workaround of calling Langfuse SDK's update_trace(input=..., output=...) directly will see OA-observer-supplied values appearing on the Trace fields after this lands (last-writer-wins on the same attribute, per the OTel span-attribute overwrite semantics the Langfuse SDK uses to emit trace input/output); migration path is to replace direct update_trace calls with the new caller hooks (trace_input_from_state / trace_output_from_state). The breaking-change surface is narrow — only callers actively bypassing the observer for these specific fields are affected.

[0.34.0] — 2026-05-29¶

Changed

observability §3.4 — reserved-key enumeration extends from 21 to 24 names. Adds branch_name, detached, detached_from_invocation_id to the §3.4 reserved set the §8.4 Langfuse mapping writes to top-level trace.metadata / observation.metadata but proposal 0041 had not enumerated. Mechanism unchanged from 0041 (invoke()-boundary rejection + same enforcement at the set_invocation_metadata helper); breaking for callers that previously supplied one of the three names as caller metadata. (proposal 0042)

Added

observability §8.4.1 — trace.metadata.detached_from_invocation_id row. Emitted on the detached child trace produced by §4.4 detached-mode dispatch; points back to the parent invocation for inverse lookup (the forward direction is correlation_id, preserved across detached and parent traces).
observability §8.4.2 — branch_name and detached Observation-metadata rows. branch_name is sourced from the graph-engine §6 NodeEvent field (parallel branches, proposal 0011), emitted on per-branch Span observations as the parallel-branches disambiguator (analogous to fan_out_index for fan-out). detached is a boolean flag on the parent-side dispatching observation that fires a detached subgraph or fan-out instance.
Conformance fixture observability/conformance/028-caller-metadata-namespace-rejection extended with three new cases (rejects_reserved_oa_name_branch_name, rejects_reserved_oa_name_detached, rejects_reserved_oa_name_detached_from_invocation_id).
Conformance fixture observability/conformance/030-caller-metadata-parallel-branches-per-branch extended with observation.metadata.branch_name assertions on every per-branch observation (dispatch span, inner ask span, generation) in both branches, plus a per-branch isolation invariant for the OA-emitted key.
Conformance fixture observability/conformance/033-langfuse-detached-trace-mode extended with observation.metadata.detached: true on the parent dispatch observation (case 1) / parent fan-out node observation (case 2), trace.metadata.detached_from_invocation_id on the detached child trace (case 1), and an invariant asserting the same field on every per-instance detached trace (case 2).

Notes

MINOR bump. Additive to the §8.4.x mapping tables; rejects previously-accepted caller code using one of the three names as caller metadata — the same disposition 0041 took for its 20 names, taken deliberately to prevent silent shadowing of OA-emitted Langfuse metadata fields. Callers using non-reserved keys are unaffected; no Langfuse-metadata-layout change.

[0.33.0] — 2026-05-29¶

Added

sessions capability — new top-level spec. A SessionStore protocol (load / save / delete / list), full-state and projected-SessionState modes, auto-save-on-completion lifecycle with explicit mid-invoke save and an opt-out, schema migration reusing the chain-resolution semantics from pipeline-utilities §10.12, last-write-wins concurrency with optional optimistic-concurrency and pessimistic-locking extension points, and six canonical error categories (session_load_failed, session_save_failed, session_state_migration_missing, session_state_migration_chain_ambiguous, session_state_migration_failed, session_write_conflict). session_id is caller-supplied at invoke() and propagates through the ambient invocation context — readable from anywhere in the invocation's async call tree (nodes, middleware, observers), the same channel used for correlation_id. (proposal 0020)
observability §5.6 — openarmature.session_id cross-cutting span attribute. When the invocation is session-bound, the attribute is emitted on every span (invocation root, node, subgraph, fan-out instance, LLM provider, retry), the same scope as openarmature.correlation_id. Absent when the invocation is not session-bound.
observability §7 — openarmature.session_id log-record field. Emitted on every log record during a session-bound invocation via the same OTel Logs Bridge mechanism as correlation_id. The §7 detached-trace-mode paragraph is extended to note session_id is invocation-scoped and unchanged across detached / parent traces.
pipeline-utilities §10.14 Composition with sessions. Notes that checkpointing and sessions are orthogonal cross-invoke persistence layers; they register independently, MAY share a backend, and surface their respective resume / session-load flows and error categories independently.
Conformance fixtures sessions/conformance/001–013 (basic resume; no-store-registered; no-id; projected state; auto-save off; mid-invoke save; migration basic, missing, chain-ambiguous, and function-raises; subgraph and fan-out composition; observability propagation).

Notes

MINOR bump. Additive: a new capability spec, two new observability cross-cutting surfaces (absent when invocations are not session-bound), and a new pipeline-utilities subsection. No existing behavior tightens or changes shape.

[0.32.0] — 2026-05-29¶

Added

llm-provider §8.3 — Google Gemini generateContent wire-format mapping. A new §8 catalog entry mapping OA's provider abstraction onto Gemini's contents / parts protocol: system extraction to systemInstruction, the assistant↔model role rename, tool role bidirectional translation via functionResponse parts, §4 tools → functionDeclarations, tool_choice → toolConfig.functionCallingConfig (including the "required"→ANY rename), all seven declared RuntimeConfig fields → generationConfig, native structured output via responseJsonSchema, and the finishReason → finish_reason mapping. (proposal 0038)
llm-provider §3 — optional reasoning-continuity signature on TextBlock and ToolCall (mirroring ThinkingBlock.signature), for providers whose signatures attach to non-thinking parts (Gemini's thoughtSignature). New §3.1.7 generalizes the strip-on-send rule: reasoning-continuity signatures are provider-bound and stripped when a message list is routed to a different provider's mapping.
Conformance fixtures llm-provider/conformance/044–053 (Gemini message round-trip, function-call flow, image blocks, tool-choice modes, RuntimeConfig mapping, error mapping, structured output native + fallback, thought-signature round-trip, cross-provider signature strip).

Changed

llm-provider §3.1.4 — ThinkingBlock.signature relaxed from required to optional. A provider may emit a thought summary that carries no own signature (Gemini, where the signature rides on sibling TextBlock / ToolCall parts). The field is preserved verbatim when present.

Notes

MINOR bump. Additive: §8.3 is a new mapping; the new §3 signature fields are optional; §3.1.7 generalizes an existing strip rule. The one relaxation (ThinkingBlock.signature required→optional) widens, not tightens, the contract.

[0.31.0] — 2026-05-28¶

Changed

observability §5.1 / §3.2 — invocation_id may be caller-supplied. openarmature.invocation_id is reframed from an unconditional framework-minted UUIDv4 to caller-supplied or framework-generated (mirroring §3.1's correlation_id): when the caller supplies an id at invoke() it is used verbatim and MAY be any non-empty URL-safe string; when absent the framework MUST mint a UUIDv4 (the UUIDv4 mandate applies to the framework-generated case only). §3.2's distinction-table "Generated by" cell updates accordingly. (proposal 0039)
observability §8.4.1 — Langfuse trace.id derivation for non-UUID ids. Langfuse requires trace.id to be a 128-bit value (32 lowercase hex). A UUID invocation_id maps to its dashes-stripped hex (as before); a non-UUID value maps to a deterministic derivation — the first 16 bytes of SHA-256(invocation_id) as 32 hex — with the raw id also written to trace.metadata.invocation_id for lookup. This derivation is exactly Langfuse's own create_trace_id(seed) helper, so the derived trace.id equals create_trace_id(seed=invocation_id). Replaces the prior pass-non-UUID-through-unchanged behavior (which produced an invalid trace).

Added

graph-engine §3 — invoke() accepts a caller-supplied invocation_id (per-language idiomatic, alongside correlation_id and the metadata mapping); on a resume call the framework mints a fresh id and ignores any caller-supplied invocation_id.
observability §3.4 — invocation_id reserved. Per proposal 0041's maintenance rule, invocation_id (now written to trace.metadata.invocation_id) is added to the reserved caller-metadata key-name set.
Conformance fixtures: observability/conformance/035-caller-invocation-id-uuid, 036-caller-invocation-id-non-uuid, and pipeline-utilities/conformance/057-resume-mints-fresh-invocation-id.

Notes

MINOR bump. Additive + opt-in: callers that don't supply an invocation_id see unchanged UUIDv4-minting behavior and the unchanged UUID→hex trace.id path. The §5.1 reframe relaxes (does not tighten) the format constraint; the §8.4.1 non-UUID derivation handles a value shape that could not previously occur. The one reserved-key addition (invocation_id) rejects a caller key that would otherwise collide with the raw-id metadata field.

[0.30.0] — 2026-05-28¶

Changed

observability §3.4 — reserve OA-emitted metadata key names against caller collision. The §3.4 caller-metadata key constraints extend the reserved set: a caller-supplied key MUST NOT exactly match any OA-emitted top-level metadata key name a §8 backend mapping writes alongside caller keys (the §8.4 Langfuse set: correlation_id, entry_node, spec_version, detached_child_trace_ids, namespace, step, attempt_index, fan_out_index, subgraph_name, fan_out_item_count, fan_out_concurrency, fan_out_error_policy, fan_out_parent_node_name, prompt_group_name, request_extras, finish_reason, system, response_model, response_id, prompt). Such a key is rejected at the invoke() boundary (exact whole-key match, backend-set-independent), the same mechanism as the existing openarmature.* / gen_ai.* prefix reservation. This prevents a caller key from silently overwriting an OA-emitted field in Langfuse's flat top-level metadata. (proposal 0041)

Added

observability §8.4 — shared-namespace note. Documents that OA-emitted Langfuse metadata keys and §3.4 caller keys share the top level of the metadata object (both placed there because Langfuse filters reliably only on top-level keys), and that §3.4's reservation keeps both filterable without collision; OA keys are not nested under a sub-object.
Conformance: fixture observability/conformance/028 extended with reserved-exact-name rejection cases (step, correlation_id, system) alongside the existing reserved-prefix cases.

Notes

MINOR bump. A caller that previously supplied one of the now-reserved bare names (e.g. metadata={"step": …}) is rejected at invoke() after this lands — a breaking change for that caller, taken to stop silent overwrite of OA-emitted Langfuse metadata. No Langfuse-metadata-layout change; callers using non-reserved keys, and existing dashboards / filters, are unaffected.

[0.29.0] — 2026-05-28¶

Changed

observability §3.4 — mid-invocation augmentation open-span update tightened SHOULD → MUST. Entries added mid-invocation via set_invocation_metadata MUST be applied in place to the spans still open in the augmenting async context — for an outermost-serial-context call, the invocation span and the calling node's span; for a fan-out instance / parallel branch, that instance's / branch's dispatch span and any open inner-node spans — where the backend SDK supports in-place attribute / metadata update. An explicit boundary is added: spans in ancestor or sibling async contexts MUST NOT be updated, preserving the per-async-context copy-on-write isolation. (proposal 0040)

Added

observability §6 — augmentation-event mechanism. New §6 guidance for how an observer-driven lifecycle reflects mid-invocation augmentation onto already-open spans: a framework-emitted metadata-augmentation event delivered in serial order on the observer queue, carrying the added entries plus the originating lineage identity (namespace / attempt_index / fan_out_index / branch_name). The open-span-update behavior is the MUST; the event is the recommended mechanism (alternatives that produce the same spans are permitted).
graph-engine §6 — observer delivery queue carries augmentation events. Clarifying note that the queue MAY carry a framework-emitted metadata-augmentation event (a distinct event kind from node-boundary started / completed, carrying no pre_state / post_state / error, not subject to the phases filter) alongside node-boundary events; the closed phase enumeration continues to apply to node-boundary events only.
Conformance fixtures: observability/conformance/029 and 030 corrected to add the inner-node span level (which carries the augmented per-instance / per-branch key per the open-span MUST); new fixture 034-caller-metadata-open-span-update-serial covering the outermost-context case (invocation span + calling node span updated in place).

Notes

MINOR bump. Tightens a previously-SHOULD behavior to a MUST — an implementation that declined the open-span update under the SHOULD must now perform it for backends whose SDK supports in-place update — and adds an observer-queue event kind. Callers that do not call set_invocation_metadata see no behavior change.

[0.28.0] — 2026-05-27¶

Added

llm-provider §8.2 — Anthropic Messages wire-format mapping. New §8.2 subsection (following the §8.X template) mapping the abstract §3/§4/§5/§6/§7 contract onto the Anthropic Messages API (POST /v1/messages): system extraction to the top-level system field; user/assistant-only messages; tool role bidirectional translation to/from tool_result content blocks (§8.2.1.2); tool_use content-block tool calls and {name, description, input_schema} tool definitions (§8.2.1.1); tool_choice mapping with the required→any rename; max_tokens required (pre-send provider_invalid_request when absent); frequency_penalty/presence_penalty rejected as unsupported; stop_reason → finish_reason mapping (incl. pause_turn); usage mapping with a cached-token note; the §8.2.3 error table (incl. 402 billing_error, 504 timeout_error); native structured output via output_config.format (§8.2.5) with tool-call-coercion and prompt-augmentation fallbacks for pre-native models (§8.2.5.1). (proposal 0037)
llm-provider §3.1 — ThinkingBlock and RedactedThinkingBlock content block types. Two new assistant-message-only block types surfacing provider-emitted reasoning content as first-class spec records. ThinkingBlock {text, signature} carries reasoning text plus an opaque provider round-trip token; RedactedThinkingBlock {data} carries an opaque redacted slot. Both are preserved verbatim on round-trip and are provider-bound (routing thinking-bearing history to a different provider strips them). §3 assistant per-role constraint relaxed so assistant content may be a content-block sequence (text + thinking/redacted-thinking; image stays user-only). §3.1 renumbered (Mixing blocks → §3.1.6). §6 Response.message note added.
llm-provider §8.1.1 — strip-on-send rule. The OpenAI mapping strips ThinkingBlock/RedactedThinkingBlock from outbound assistant messages (OpenAI has no wire representation for reasoning content), enabling cross-provider conversation routing without manual filtering. Generalizes to any mapping that does not surface reasoning content; reasoning signatures are provider-bound.
Conformance fixtures llm-provider/conformance/033-043 (eleven): basic round-trip, tool-call flow, image blocks, tool_choice modes, RuntimeConfig mapping, max_tokens-required, error mapping, native structured output, structured-output fallback, thinking-block round-trip, and OpenAI thinking-block strip. The Anthropic fixtures carry a mapping: anthropic discriminator (a harness extension; fixtures without it target the §8.1 OpenAI mapping).

Notes

MINOR bump. Additive: a new §8.X wire-format mapping, two new optional content-block types (assistant-only, absent unless a reasoning-surfacing provider emits them), and one strip-on-send rule on §8.1 (affects outbound wire only when thinking blocks are present, which prior to this proposal could not occur). No breaking changes — existing callers and the §8.1 mapping are unaffected.
Anthropic provides native structured output (GA on current Claude models) via output_config.format; the mapping uses the native path (mirroring §8.1.5), with tool-call coercion and prompt-augmentation demoted to fallbacks for pre-native models.

[0.27.1] — 2026-05-27¶

Fixed

observability/conformance/031-langfuse-subgraph-span-hierarchy — corrected metadata.step values to match graph-engine §6's "subgraph-internal node executions increment the same counter" rule. Previously asserted outer_out: step 2, which contradicts §6 (the global counter increments through inner subgraph nodes). Corrected: outer_in: 0, inner_x: 1, inner_y: 2, outer_out: 3. The outer_sub wrapper observation's synthesized step (=1, matching the first inner event) was already correct. Added explicit step assertions on inner_x and inner_y (previously unasserted) so the global-counter behavior is documented in the fixture rather than implicit.
observability/conformance/033-langfuse-detached-trace-mode — corrected metadata.namespace on the detached trace's inner observation (step node in case A) from ["long_running_workflow", "step"] to ["dispatch", "step"]. The earlier value used the subgraph identity at the wrapper position, conflating subgraph_name (identity, per §5.3 / §8.4.2) with namespace (wrapper node name, per graph-engine §6 convention). Across detached and non-detached modes alike, namespace is wrapper-node-name-scoped — only subgraph_name carries the identity. The two attributes are complementary, not redundant.

Notes

Patch bump. Pure fixture-YAML corrections. No spec-text changes; no behavior changes for any compliant implementation. The fixtures were inconsistent with the spec prose (§6 step semantics; §5.3 / §8.4.2 + graph-engine §6 namespace convention) at proposal 0035's acceptance time; this patch aligns the YAML with the prose that was always normative. Implementations correctly tracking §6 step semantics and graph-engine §6 namespace conventions pass the corrected fixtures.
Proposal 0035 (0035-observability-langfuse-graph-topology-fixtures) is unchanged in its accepted prose; only the YAML deliverables it introduced are corrected here. No proposal-text immutability concern — the corrections are to fixture artifacts that contradicted the spec text, not to the proposal's claims about what was asserted.

[0.27.0] — 2026-05-27¶

Added

graph-engine §2 — concat_flatten and merge_all required built-in reducers. §2's required-built-in reducer set expands from three (last_write_wins, append, merge) to five with two new members for the fan-out collection case. concat_flatten(prior, update) concatenates prior with the one-level flattening of update; both arguments MUST be lists and every element of update MUST itself be a list. merge_all(prior, update) folds the sequence of mappings in update into prior with shallow last-write-wins per key (consistent with merge's single-dict semantics across the N dicts); prior MUST be a mapping, update MUST be a list, every element of update MUST itself be a mapping. Both reducers are strict — non-matching shapes raise ReducerError per §4; auto-detection between list-of-lists vs. flat list (and analogously between list-of-mappings vs. single mapping) is explicitly rejected by §2. Both are duals of append / merge for the fan-out target field case where the per-instance value collected by pipeline-utilities §9's fan-out is itself a collection (list or mapping). (proposal 0036)
pipeline-utilities §9.3 — target_field reducer contract broadened. The previous wording mandated a list-extending reducer (append or user-defined equivalent that concatenates list values), which excluded merge_all since it returns a mapping rather than a list. The updated wording permits any reducer compatible with the engine-produced list of per-instance values as its update argument, explicitly enumerating the three §2 built-ins valid for target_field: append, concat_flatten, merge_all. User-defined reducers are still permitted under the same broader contract. Required for cross-spec consistency with the new §2 built-ins from this proposal — merge_all is otherwise blocked from its motivating fan-out → merged-dict use case. No change to the §9.3 fan-in mechanic itself (instance-index ordering, contribution timing, error policy), only to the reducer-type contract.
Conformance fixtures graph-engine/conformance/026-reducer-concat-flatten and 027-reducer-merge-all, each covering success path, empty-update no-op, empty-inner-collection no-op, and non-element-shape reducer_error raise. The non-list-update and non-list-prior error contracts are spec-normative but caught at the typed-state validation layer in strict-typed implementations before reaching the reducer; the fixture-covered non-element error is the case the reducer is guaranteed to be the gatekeeper for.

Notes

MINOR bump. Additive normative change to the conformance surface — the required-built-in reducer set expands from three to five. No breaking changes for caller code (existing reducer declarations continue to work unchanged); implementations that pass the v0.26.x graph-engine fixtures without concat_flatten and merge_all will no longer pass v0.27.0 conformance, which is the intended behavior.
No changes to §3 (Execution model), §4 (Error categories — both new reducers route failures through the existing ReducerError / reducer_error machinery), §5 (Determinism), §6 (Observer hooks), or any other §-section. No changes to the pipeline-utilities §9 fan-out collection contract — that contract stays "collect one value per successful per-instance subgraph"; the new reducers consume the resulting list-of-collections at the parent state layer.

[0.26.1] — 2026-05-27¶

Added

observability §8.3 / §8.5 — three new conformance fixtures hardening cross-impl parity for the Langfuse mapping graph-topology rows v0.23.0 (proposal 0031) shipped normatively but only partially covered by fixtures. observability/conformance/031-langfuse-subgraph-span-hierarchy exercises §8.3 row 3 (Subgraph span → Span observation) and §8.4.2 subgraph-related metadata (namespace, subgraph_name). 032-langfuse-fan-out-per-instance-spans exercises §8.3 rows 4-5 (Fan-out node → dispatch Span observation; Fan-out instance → child Span observation under dispatch) and §8.4.2 fan-out-node-specific keys (fan_out_item_count / concurrency / error_policy on the dispatch only) plus fan-out-instance-specific keys (fan_out_index, fan_out_parent_node_name on each per-instance observation). 033-langfuse-detached-trace-mode exercises §8.5 detached-trace-mode rules for both detachment levels (subgraph and fan-out) — each detached child mints a separate Langfuse Trace, the parent's dispatch observation carries metadata.detached_child_trace_ids (string array, one entry per detached child), and correlation_id is invocation-scoped across all Traces in the invocation. (proposal 0035)

Notes

Patch bump. Pure conformance-coverage extension; no normative spec-text changes, no public-type changes, no behavior changes for any compliant implementation of the §8 Langfuse mapping as already specified by v0.23.0. The new fixtures harden the contract that was always there.
Each new fixture mirrors an existing OTel-side fixture one-to-one in graph topology: 031 mirrors 002-otel-subgraph-hierarchy; 032 mirrors 006-otel-fan-out-instance-attribution; 033 mirrors 008-otel-detached-trace-mode (including its two-cases shape).

[0.26.0] — 2026-05-26¶

Added

prompt-management §3 — Prompt.sampling typed sub-record (optional). Mirrors llm-provider §6 RuntimeConfig's declared-fields-plus-extras shape: seven optional declared fields (temperature, max_tokens, top_p, seed, frequency_penalty, presence_penalty, stop_sequences) plus an extras mapping for vendor-specific keys. Per-language implementations SHOULD use the SAME type as RuntimeConfig (or a structurally-compatible subtype) so callers can splat prompt.sampling directly into provider.complete(config=...) without per-field translation. The model identifier is NOT part of SamplingConfig; per-prompt model selection is out of scope. (proposal 0033)
prompt-management §3 — Prompt.observability_entities typed mapping (optional). Backend-keyed mapping (dict[str, Any] | None) carrying references to first-class entities the prompt has been registered as in observability backends. Spec-normative key: langfuse_prompt for the Langfuse SDK Prompt entity. Future observability backend mappings (Phoenix, Honeycomb LLM lens, etc.) define their own keys under the <backend>_<entity> naming convention. Replaces the v0.23.0 implementation-defined Prompt.metadata['langfuse_prompt'] placeholder with a spec-defined location for observability §8.4.4's lookup.
prompt-management §4 — PromptResult propagation of both new fields (sampling, observability_entities) from the source Prompt unchanged through render.
prompt-management §5 — informative filesystem sidecar conventions for sourcing Prompt.sampling. Two recommended shapes documented: per-prompt sidecar (<root>/<name>.config.json with top-level SamplingConfig) and unified config (<root>/prompt_configs.json with top-level mapping from prompt name to SamplingConfig). Conventions are informative — the spec mandates the field on Prompt, not the file layout — but documenting the two recurring patterns removes per-adopter re-derivation.
prompt-management §6 — LabelResolver integration on PromptManager.fetch(). The default value for the label parameter shifts from "production" to None / sentinel; resolution now follows a three-step chain: explicit label > resolver lookup > spec-fallback "production". Manager constructed without a LabelResolver falls back directly to "production", so existing v0.15.0 callers continue to work without modification.
prompt-management §7 — LabelResolver primitive (new section). Optional helper that maps prompt names to labels for deployment-time A/B testing. resolve(name) -> str operation with a fallback chain: per-name override > default override > spec-fallback "production". Implementations MAY back resolvers with static mappings, JSON files, environment-variable lookups, remote config services. Renumbers existing §7-§13 → §8-§14.
prompt-management §12 — cross-spec touchpoints (was §11) extended with two new touchpoints: Prompt.sampling → llm-provider §6 RuntimeConfig wiring at the LLM call site (shape-compatibility for direct splat); Prompt.observability_entities['langfuse_prompt'] → observability §8.4.4 Langfuse Generation linkage lookup (spec-defined target for the case-1 / case-2 trigger introduced by proposal 0031). (proposal 0033)
observability §8.4.4 — Langfuse Prompt-entity reference lookup location updated. The case-1 / case-2 trigger semantic (whether to establish the Generation → Prompt link) is unchanged from v0.23.0; the LOOKUP location moves from Prompt.metadata (impl-defined key) to spec-defined Prompt.observability_entities['langfuse_prompt']. Implementations of the v0.23.0 Langfuse mapping update their lookup; visible behavior is unchanged.
Conformance fixtures prompt-management/conformance/013-prompt-sampling-from-backend (sub-record propagation through fetch → render), 014-prompt-sampling-absent (opt-in semantic; no defaulting), 015-label-resolver-fallback-chain (three-step precedence + explicit-bypass + no-resolver-configured cases in one fixture), 016-prompt-observability-entities-propagation (two cases — populated + absent).

Notes

MINOR bump. Two new optional fields, new primitive (LabelResolver), informative filesystem-sidecar conventions, two new cross-spec touchpoints, observability §8.4.4 textual update, four new conformance fixtures. No breaking changes. Callers using fetch(name) without an explicit label continue to get "production" when no resolver is configured (spec-fallback path). Implementations of the v0.23.0 Langfuse mapping update their lookup location for the Langfuse Prompt reference from impl-defined metadata to spec-defined observability_entities['langfuse_prompt']; the visible behavior is unchanged.
Section renumbering: prompt-management §7-§13 → §8-§14 to make room for new §7 LabelResolver. Internal cross-references updated accordingly.
Out-of-order acceptance. This proposal (0033) was originally drafted as the v0.25.0 candidate but ships as v0.26.0 because proposal 0034 (caller-supplied invocation metadata propagation) completed its acceptance lifecycle first. The proposal numbers are not in version order in the CHANGELOG; each entry's link to the driving proposal makes the connection unambiguous.

[0.25.0] — 2026-05-26¶

Added

observability §3.4 — caller-supplied invocation metadata (new subsection sibling to §3.1 / §3.2 / §3.3). Callers attach an optional dict[str, AttributeValue] mapping at invoke() time as the baseline, augmentable mid-invocation via a per-language framework helper (see the mid-invocation augmentation bullet below), propagated via the language's context primitive, invocation-scoped (flows through detached subgraphs and fan-outs per §4.4). Value types are OTel-attribute-compatible scalars or homogeneous arrays. Keys MUST NOT collide with reserved namespaces (openarmature.*, gen_ai.*); implementations MUST reject collisions at the API boundary (and at the augmentation-helper call site) before any work begins. The OTel mapping is the primary cross-vendor propagation (see §5.6); backends whose data model carries trace-level metadata as a typed field separate from OTel attributes need their own per-backend propagation rule. (proposal 0034)
observability §5.6 — openarmature.user.* cross-cutting attribute family. For each entry in §3.4's caller-supplied metadata, every span emitted during the invocation MUST carry an attribute named openarmature.user.<key> with the supplied value (cross-cutting, same pattern as openarmature.correlation_id). Invocation span, every node / subgraph / fan-out / LLM provider / retry attempt span. Detached children inherit. The openarmature.user. prefix is reserved for caller-supplied metadata.
observability §7 — log-record propagation of caller metadata. Log records emitted during an invocation MUST carry the full openarmature.user.* attribute set, alongside the existing openarmature.correlation_id. Same OTel Logs Bridge mechanism.
observability §8.4.1 + §8.4.2 — Langfuse propagation of caller metadata. Each entry in §3.4's caller metadata merges into the Langfuse trace.metadata map AND into every observation.metadata map as top-level keys (sibling to correlation_id). Top-level placement so Langfuse UI filtering on metadata.<key> matches what callers supplied. Per-observation propagation enables filtering across detached subgraphs and fan-out instances. Caller entries are NOT promoted to trace.userId / trace.sessionId; those Session surfaces are deferred to a future sessions capability.
observability §8.4 — Langfuse-Sessions distinction note + Langfuse-specific constraint note. Distinguishes the §3.4 caller-supplied metadata path from Langfuse Sessions (cross-trace grouping under sessionId, deferred). Documents Langfuse's constraints on propagated metadata (alphanumeric keys, 200-character value strings) and clarifies that the §3.4 API-boundary validation does not enforce backend-specific constraints by default; implementations MAY expand the rejected-key set to also catch them early.
graph-engine §3 — invocation-entry-surface clarification. New paragraph noting invoke() accepts an optional caller-supplied metadata mapping (per observability §3.4) alongside the existing correlation_id argument. Per-language mechanism (keyword argument; field on invocation-config record). Contracts live in observability §3.1 + §3.4.
Mid-invocation augmentation of caller-supplied metadata via a per-language framework helper (e.g., openarmature.observability.set_invocation_metadata(**entries) in Python). Code inside a node body, middleware, or observer can add entries to the in-scope metadata mapping mid-flight; the same reserved-namespace and value-type rules apply at the call site. Per-async-context scoping (Python ContextVar, TypeScript AsyncLocalStorage) keeps fan-out instances and parallel-branches instances isolated — each instance's augmentations affect only its own subtree's spans / observations, with no leakage to siblings. Solves the canonical fan-out-with-per-item-id pattern: each instance attaches its product / document / record identifier, and Langfuse / OTel filtering by that identifier surfaces that instance's specific subtree.
Conformance fixtures observability/conformance/026-otel-caller-supplied-metadata (cross-cutting attribute family on every span), 027-langfuse-caller-supplied-metadata (top-level merge on Trace + every Observation), 028-caller-metadata-namespace-rejection (openarmature.* and gen_ai.* rejection at invoke() API boundary), 029-caller-metadata-fan-out-per-instance (mid-invocation augmentation + per-async-context scoping verified against three fan-out instances with per-product augmentation), 030-caller-metadata-parallel-branches-per-branch (same contract verified against parallel-branches dispatching two heterogeneous subgraphs with per-branch augmentation; covers the separate pipeline-utilities §11 code path).

Notes

MINOR bump. New normative caller-side surface, new cross-cutting attribute family, new Langfuse propagation rules, new graph-engine §3 clarification, new mid-invocation augmentation helper with per-async-context scoping, five new conformance fixtures. No breaking changes. Existing callers that don't supply metadata see no behavior change. Existing observability backends pick up the new propagation rules at their next version bump.
Cross-vendor reach. The §5.6 cross-cutting attributes flow through any OTel-attribute-based backend (Phoenix / Arize, Honeycomb, Datadog APM, HyperDX, Grafana Tempo, etc.) automatically; only backends with their own typed metadata-on-trace field (Langfuse first) need explicit propagation rules in their respective §-section.
Cross-backend key portability. Callers wiring OA to multiple backends SHOULD use alphanumeric / camelCase keys (e.g., tenantId, userId, featureFlag) since some backends (Langfuse) impose key-name constraints. The OA spec only enforces the reserved-namespace rule at the API boundary; backend-specific constraints surface at the backend's emission layer unless implementations choose to expand the rejected-key set per §3.4.

[0.24.0] — 2026-05-26¶

Added

llm-provider §6 — three new declared RuntimeConfig fields: frequency_penalty, presence_penalty, and stop_sequences. The first two are cross-vendor standard sampling parameters (every major provider supports equivalents). The third matches the cross-vendor OpenTelemetry GenAI semconv naming (stop_sequences) and the wire-key convention used by Anthropic / Gemini / Cohere; the OpenAI-compatible wire mapping (§8.1) translates this field to OpenAI's request-body key stop. (proposal 0032)
llm-provider §6 — explicit extras-pass-through contract. Replaces the prior vague "implementations MAY accept additional provider-specific fields" line with a normative contract: undeclared RuntimeConfig fields MUST reach the wire request body untouched, subject to the §8 wire-format mapping. The pass-through MUST NOT translate, rename, or transform undeclared fields. Codifies the behavior every existing adopter already relies on for passing vendor-specific knobs (e.g., repetition_penalty, top_k, min_p through OpenAI-compatible providers to vLLM).
llm-provider §6 — null-skip semantics on declared fields. A declared RuntimeConfig field with value None / undefined (the language's "unset" sentinel) MUST be omitted from the wire request body. Distinct from "field supplied with an explicit null value." Implementations MUST NOT serialize None-valued declared fields as JSON null. Lets callers construct partial configs by leaving unset fields as default-null and rely on the framework to omit them at the wire layer — no defensive null-filter shim needed.
llm-provider §8.1 — extended declared-field mapping table. The pre-0032 four (temperature, max_tokens, top_p, seed) and the two new same-named declared fields (frequency_penalty, presence_penalty) map directly to OpenAI request-body keys. The third new declared field (stop_sequences) renames to OpenAI body field stop — the OA name follows the cross-vendor semconv; OpenAI is the outlier with the shorter wire-key name; the wire mapping handles the translation.
llm-provider §8.1 — formal undeclared-field placement contract. Undeclared RuntimeConfig fields appear at the OpenAI request-body root, as siblings to temperature, model, etc. Codifies the behavior every existing OpenAI-compatible adopter relies on (OpenAI SDK extra_body, LangChain kwarg-splat, gateway pass-through). The §8.1 mapping does NOT validate, rename, or transform undeclared keys; key names and value types are preserved verbatim per §6's extras-pass-through.
observability §5.5.2 — three new GenAI semconv attributes: gen_ai.request.frequency_penalty, gen_ai.request.presence_penalty, gen_ai.request.stop_sequences. Mapped from the three new declared RuntimeConfig fields. The §8.4.3 Langfuse-mapping reference to §5.5.2 picks them up by inclusion: the three new attributes flow into generation.modelParameters.{frequency_penalty, presence_penalty, stop_sequences} automatically, no §8 edit required.
Conformance fixtures llm-provider/conformance/032-runtime-config-declared-fields-and-null-skip (two cases — full declared-field set + one extras key landing at the body root; partial config exercising the null-skip rule including the stop_sequences → stop rename), and observability/conformance/025-otel-llm-request-params-extended (one case — all seven declared RuntimeConfig fields emit the corresponding seven gen_ai.request.* attributes).

Changed

observability/conformance/018-otel-llm-request-extras — example extras key switched from frequency_penalty (which is now a declared field as of this release) to repetition_penalty (a vLLM / HuggingFace-style vendor-specific extra with no path to becoming a declared field). No behavior contract change to §5.5.1; only the fixture's example key changes so the extras-bag demonstration continues to depict a genuinely undeclared field. (proposal 0032)

Notes

MINOR bump. Adds three declared fields, two new normative clauses to llm-provider §6, mapping extensions to §8.1, three new observability attributes, and two new conformance fixtures. No breaking changes. Existing callers passing frequency_penalty / presence_penalty / stop via the extras path continue to work via the §6 extras-pass-through contract; the new declared fields take precedence over a same-named extras key when both are supplied.
Naming convention precedent. Where the OpenTelemetry GenAI semconv has settled on a cross-vendor name, OA's declared field uses that name (even when one specific provider — OpenAI in this case — uses a shorter wire-key form). The wire-format-mapping layer (§8) is the right place to translate to vendor body keys. This convention applies prospectively to future §8.2 (Anthropic) / §8.3 (Gemini) mappings and to any later declared-field additions.

[0.23.0] — 2026-05-26¶

Added

observability §8 — Langfuse backend mapping (sibling section to the OpenTelemetry mapping in §3–§7). Specifies how OA's §6 observer event stream maps to Langfuse's native data model — Traces, Observations (Generation, Span, Event), and the Prompt entity — so a Langfuse observer writes Langfuse-shaped data directly instead of mirroring OTel attributes through Langfuse's OTLP ingest. Coverage: (proposal 0031)
§8.3 Observation-type mapping — invocation → Trace (container); node/subgraph/fan-out → Span observation; LLM provider → Generation observation; retry attempts → sibling observations under the same parent.
§8.4 Attribute mapping table — three sub-tables (Trace-level, Observation-level, Generation-specific) translating openarmature.* and gen_ai.* to Langfuse native fields. The Generation table references §5.5.2 by inclusion so future request-parameter additions flow into generation.modelParameters without further §8.4.3 edits. §8.4.4 covers prompt linkage: a Langfuse Prompt-entity link is established when the prompt's source exposes a Langfuse Prompt reference (capability-based trigger, not tied to any specific PromptBackend implementation); otherwise identity surfaces via the nested generation.metadata.prompt map only.
§8.5 Correlation ID realization — metadata.correlation_id on both Trace and Observation levels; cross-trace reference (metadata.detached_child_trace_ids) on the parent's dispatch observation for detached subgraphs / fan-outs.
§8.6 Trace name — MUST-support caller-supplied invocation label; SHOULD-default to the entry-node name when no caller label is supplied.
§8.7 Generation rendering — input/output emission gated by the Langfuse observer's own disable_llm_payload flag (independent of the OTel observer's flag); truncation marker passes through as a raw string when the underlying §5.5.1 payload was truncated.
§8.8 Prompt linkage — references §8.4.4.
§8.9 Composition with OTel — both observers consume the §6 event stream independently; each disable_llm_* flag is per-observer. The cross-backend correlation ID (§3) joins the two views.
§8.10 Out of scope — Langfuse Sessions, Scoring, Cost emission, and PromptBackend caching policy.
Conformance fixtures 022-langfuse-basic-trace, 023-langfuse-generation-rendering (two cases — normal rendering and the §8.7 truncation-fallthrough path), 024-langfuse-prompt-linkage (two cases — source exposes a Langfuse Prompt reference vs. does not). Introduces new harness primitives: langfuse_observer.{disable_llm_payload, disable_llm_spans, payload_byte_cap} config block; expected.langfuse_trace recorder shape (Trace + nested Observation tree with type, name, metadata, Generation fields, children); prompt_backend.type selector with two recognized values (mock_with_langfuse_reference, filesystem); input_parses_as_messages and input_is_raw_string_with_marker assertions for the §8.7 rendering paths; prompt_entity_link / prompt_entity_link_absent assertions for §8.4.4 cases.

Changed

observability §1 closing paragraph updated to acknowledge the Langfuse mapping as a sibling section alongside the OTel mapping (no longer a future deferral); OTel remains the reference shape for cross-backend equivalence. (proposal 0031)
observability §2 Concepts Correlation ID example now cross-references §8.5 (Langfuse realization) alongside §5.6 (OTel realization).
observability §3.3 Backend-mapping contract now names §8.5 as the Langfuse realization of the correlation-ID surface.
observability §8 (was Determinism) renumbered to §9; one sentence added affirming that Langfuse observation content is similarly a function of (a) the §6 event stream and (b) implementation-specific data (timestamps, observation IDs, trace IDs).
observability §9 (was Out of scope) renumbered to §10; the "Langfuse mapping — separate proposal" bullet removed since the mapping now lives in §8.

Notes

MINOR bump. Adds a new spec section with normative behavior. No breaking changes to §1–§7; OTel-only implementations continue to conform at v0.23.0 without modification.
Cross-language status. No TypeScript implementation exists yet; the §8 conformance fixtures will be exercised against TS once the TS implementation enters its harness phase. Python is the first language implementation target.
Prompt-management touchpoint. §8.4.4's "Langfuse Prompt reference" mechanism is left implementation-defined under prompt-management §3's metadata mapping. A follow-on proposal MAY normatively define how backends expose the reference; until then, implementations are free to choose the surface (metadata field, interface marker, SDK-side accessor).

[0.22.1] — 2026-05-25¶

Changed

graph-engine §6 Drain gained two clarifications of implicit rules surfaced during 0010's implementation pass. Both are textual sharpenings of contracts existing implementations already follow; the clarifications close cross-implementation drift before the TypeScript implementation lands. (proposal 0030)
Snapshot semantic. The set of invocations covered by a drain call is the set whose worker(s) were active at the time drain is invoked. Invocations started after drain is called are NOT covered; callers needing delivery guarantees for a later invocation MUST call drain again. Composes cleanly with the optional timeout: the deadline applies to a known finite worker set captured at call time, not an open-ended set that new invocations could extend past the deadline.
Timeout-input validation. Implementations MUST reject negative or NaN timeout inputs by raising an API-boundary error before any drain work begins. The error surface is per-language idiomatic (Python ValueError, TypeScript RangeError, Go error return value); the spec mandates the rejection, not the error type. Non-numeric input is rejected per the language's type-error idiom.

Notes

Pre-1.0 PATCH bump. Textual clarification of implicit rules; no new conformance fixtures (matches the v0.16.1 / v0.17.1 / v0.21.1 precedent). Both rules are awkward to test cross-language — the snapshot rule is timing-sensitive and the timeout-validation error-surface is per-language. The normative rules plus per-language documentation are sufficient.
No backward-compat carve-out. Pre-1.0, no shipping consumers of either rule that wasn't already following the natural reading. Implementations whose drain already snapshots and rejects invalid timeout inputs see no behavior change; those that don't update to comply.
Skip-ahead implementation. Per the Skip-ahead governance principle, implementations that have not yet shipped against v0.22.0 may target v0.22.1 directly.

[0.22.0] — 2026-05-25¶

Added

pipeline-utilities §10.11 — "Count drift on resume" rule. New normative paragraph mandating that the engine MUST raise checkpoint_record_invalid (per §10.10) when a saved fan_out_progress entry's instance_count differs from the resumed run's resolved count for the same fan-out node. Silent pad/truncate of the saved instances list is not permitted — per-instance accumulator contributions written under one instance_count cannot be reconciled with a different count without risking dropped or duplicated entries at the fan-in step, breaking §10.11.1's exactly-once reducer guarantee. The check MUST happen before any fan-out instance work runs on the resumed path; a saved record with multiple fan-out entries raises on the first mismatch encountered. Users who intentionally change a fan-out's input set between runs MUST start a fresh invocation rather than resume. (proposal 0029)
Conformance fixture 056-checkpoint-fan-out-count-drift (pipeline-utilities). Two cases exercise both directions of drift (shrunk count 5 → 3; grown count 5 → 7); both assert checkpoint_record_invalid surfaces before fan-out instance work runs. Introduces one new harness primitive: resume_with_modified_items: {<field>: <new-value>} (re-resolves the named field on the resumed graph's initial state to simulate "user changed the input set between runs").

Changed

pipeline-utilities §10.10 checkpoint_record_invalid description extended with one sentence enumerating fan_out_progress[*].instance_count drift between save and resume as a structural-incompatibility failure mode covered by the existing category. No new category minted; the existing checkpoint_record_invalid surface absorbs the new failure mode (consistent with the v0.20.0 provider_invalid_request extension pattern). (proposal 0029)

Notes

Pre-1.0 MINOR bump. New normative rule + new conformance fixture. Implementations need actual work (a count-equality check on the resume path before fan-out dispatch, plus the new harness primitive in the conformance adapter). Existing v0.21.1 fixtures pass unchanged — no fixture exercises count drift before this proposal.
No backward-compat carve-out. Pre-1.0, no shipping consumers; the strict rule applies normatively from acceptance forward. Implementations that previously absorbed count drift via permissive pad/truncate update to raise.
Both directions treated symmetrically. Padding with not_started on a grown count is rejected for the same structural-incompatibility reason as truncating on a shrunk count. Implementations MAY surface the category with an impl-defined error payload identifying which fan_out_node_name / namespace triggered the raise; the spec mandates the category, not the payload shape. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.21.1 may target v0.22.0 directly.

[0.21.1] — 2026-05-25¶

Changed

pipeline-utilities §10.2 schema_version paragraph clarified to name the outermost declared graph state class as the canonical source for the value written onto saved records. The framework reads schema_version from the state class passed to the graph constructor (e.g., GraphBuilder(MyState) in Python), not from type(state).schema_version at save time. Implementations MUST NOT source schema_version from the runtime instance's class when the user passes a State subclass instance whose schema_version shadows the declared class's value — the declared class is canonical for all save sites in the engine (outermost-graph, subgraph-internal, fan-out instance internal, fan-out node completion), so resume sees a single consistent schema_version and §10.12 migration registry lookups resolve unambiguously. (proposal 0028)
Conformance fixture 055-checkpoint-schema-version-declared-class (pipeline-utilities) added. Exercises the canonical-source rule via a graph declared against a state class with schema_version: "v1", invoked against a subclass instance whose schema_version is "v2", driven through a fan-out completion so multiple save sites fire. Asserts every captured save reports schema_version: "v1". Introduces two new harness primitives: runtime_state_subclass: {schema_version: "<v>"} (per-language subclass-with-override construct) and every_save_assertions.schema_version: "<v>" (assert against every captured save, not just the latest).

Notes

Pre-1.0 PATCH bump. Textual clarification of an implicit rule (the §10.12 migration system already implicitly assumed the declared class was canonical; this proposal makes the assumption explicit on the save side). Matches the v0.16.1 precedent for spec-text clarifications that force some implementations to align their reads. No new types, no new error categories, no new behavior — only an explicit normative rule against a previously implicit one.
No backward-compat carve-out. Pre-1.0, no shipping consumers; the declared-class rule applies normatively from acceptance forward.
Cross-implementation consistency. Locking in the rule before TypeScript implementation work means all implementations land on the same canonical source. Implementations whose save sites already read the declared class consistently see no behavior change; implementations with the declared/instance inconsistency (the reference Python implementation, after the proposal 0009 impl-review pass) update by threading the declared outermost state class through their invocation context to all save sites. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.21.0 may target v0.21.1 directly.

[0.21.0] — 2026-05-25¶

Added

pipeline-utilities §10.11 per-instance entry shape gained result_is_error: bool. Boolean discriminator on each entry of CheckpointRecord.fan_out_progress[*].instances[*]: true when the entry's result is a collect-mode error contribution that rolls forward into errors_field on resume, false when it's a success contribution that rolls forward into target_field. MUST be false for state in {"in_flight", "not_started"} (the value of result is also unused in those states). The field is normatively required on every entry; implementations MUST populate it on save and consult it on resume. Inferring routing from result shape is not permitted. (proposal 0027)

Changed

pipeline-utilities §10.11.2 collect bullet amended to name result_is_error as the routing discriminator for completed-entry contributions on resume. The previous shape-inspection workaround (heuristic match against the engine's canonical error-record dict shape) is explicitly forbidden in favor of consulting the boolean field. (proposal 0027)
Conformance fixtures 048-checkpoint-fan-out-per-instance-resume-skips-completed through 054-checkpoint-fan-out-batching-buffered-saves-lost-on-crash (pipeline-utilities) updated to assert result_is_error on every per-instance entry in their saved_record_assertions.fan_out_progress[*].instances lists, enforcing the new requiredness across all four state combinations (success-completed → false, collect-mode-error-completed → true, in_flight → false, not_started → false). Fixture 052 is the only fixture exercising the result_is_error: true case (it's the only fixture with a collect-mode failure that gets recorded as a completed contribution); fixtures 048–054 other than 052 all carry result_is_error: false uniformly. The fixture-only result_kind: error harness primitive that was a workaround for the missing normative discriminator is retired; fixture 052 was its sole consumer. Fixture 052 also adds a result_present: true matcher on the collect-mode error entry to assert §10.11's "the contribution is reflected in result" rule without constraining the impl-defined error-record shape (per §9.5). (proposal 0027)

Notes

Pre-1.0 MINOR bump. New required field on a saved-record data structure — implementations that passed the v0.20.1 fixtures need actual work to pass v0.21.0 (populate the field on save, consult it on resume, remove any heuristic shape-inspection fallback). Existing v0.20.1 fixtures (other than 052) pass unchanged; fixture 052's saved-record assertion now exercises the new field. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.20.1 may target v0.21.0 directly.
No backward-compat carve-out. Pre-1.0, no shipping consumers between the v0.18.0 introduction of fan_out_progress and this proposal: the discrimination contract is normatively required from acceptance forward, with no transitional "MAY fall back to a heuristic for older records" allowance. The cleaner contract is cheap to specify while exactly one implementation exists.
Cross-language consistency. Locking in the discrimination mechanism before TypeScript implementation work avoids reconciling diverged heuristics later. The boolean field round-trips cleanly through any Checkpointer backend without requiring per-language agreement on the engine's internal error-record shape (which remains implementation-defined per §9.5).

[0.20.1] — 2026-05-24¶

Changed

llm-provider §8 framing gained a Per-mapping subsection structure paragraph recommending the canonical §8.X subsection template (Request mapping / Response mapping / Error mapping / Concurrency / Structured output, in that order) used by §8.1. Provider-specific sub-subsections (e.g., §8.X.1.1 for content-block wire mapping, §8.X.5.1 for fallback) are permitted and expected; providers MAY add additional top-level subsections at the end of the canonical five for features without §8.1 analogues (e.g., §8.X.6 Caching). SHOULD-level rather than MUST-level — when a §8.X proposal diverges, the proposal text SHOULD explain the divergence in its Detailed design so reviewers can confirm it's structural rather than ergonomic. Resolves 0019's open-question #2 (per-mapping section structure). (proposal 0026)

Notes

Pre-1.0 PATCH bump. Purely textual structural recommendation. No new types, no new error categories, no behavioral change. All v0.20.0 conformance fixtures pass unchanged. §8.1 already follows the template by construction (it IS the template source). Matches the v0.16.1 / v0.17.1 precedent for spec-text clarifications.
Cross-language consistency story. The template lock-in is sequenced so §8.2 Anthropic and §8.3 Gemini follow-ons land against the same canonical structure — readers who know §8.1's organization can navigate §8.X by reflex, fixture sidecars reference subsection numbers predictably across mappings, and cross-language consistency (Python ↔ TypeScript siblings) extends to the spec-text structure as well as the wire shapes.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.20.0 may target v0.20.1 directly.

[0.20.0] — 2026-05-24¶

Added

llm-provider §5 complete() gained an optional tool_choice parameter. Four modes: "auto" (model decides), "required" (model MUST call at least one tool), "none" (model MUST NOT call tools), and {type: "tool", name: <string>} (model MUST call the named tool). When omitted (None / absent), the engine omits the wire-level tool_choice field and the provider's own default applies — preserving v0.4.0 behavior exactly. Pre-send validation routes three new failure modes through provider_invalid_request (§7): (1) required with empty / absent tools; (2) force-specific with empty / absent tools; (3) force-specific with name not in supplied tools. The framework does NOT enforce the constraint post-hoc — whether the model honored it is observable from Response.finish_reason / Response.message.tool_calls but is not framework-policed (per §6's transparency principle). (proposal 0025)
llm-provider §8.1.1 OpenAI request mapping gains a tool_choice row covering the four modes plus the None-omitted-from-wire case. The spec {type: "tool", name: X} discriminator renames to OpenAI's {type: "function", function: {name: X}} wire shape (implementation performs the rename when constructing the wire body). (proposal 0025)
Conformance fixtures 029-tool-choice-modes, 030-tool-choice-force-specific, 031-tool-choice-validation (llm-provider). New harness primitive: expected_wire_request_checks.tool_choice_absent: true (sibling-to-expected_wire_request block asserting a key is absent from the wire body, distinct from present-with-null; follows fixture 027's expected_wire_request_checks.response_format_absent precedent). Fixture 029 establishes the precedent that the mock provider returns constraint-compliant responses for the required and none cases; assertions verify end-to-end response mapping, not framework enforcement.

Changed

llm-provider §7 provider_invalid_request description extended to enumerate the three new validation failure modes for tool_choice (required-with-empty-tools, force-with-empty-tools, force-name-not-in-list). No new category — the existing surface absorbs the new failure modes. (proposal 0025)

Notes

Pre-1.0 MINOR bump. Implementations passing the v0.19.0 fixtures need actual work to pass the new fixtures (extend complete() with the new parameter, add pre-send validation, add the §8.1.1 wire mapping row). The no-tool_choice path is backward-compatible: existing callers passing no tool_choice continue to see the same wire shape they did in v0.4.0. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.19.0 may target v0.20.0 directly.
Sequenced ahead of §8.2 Anthropic + §8.3 Gemini follow-ons. Adding tool_choice to complete() BEFORE the per-provider mappings ship avoids retrofitting three §8.X subsections in lockstep. The §8.1.1 mapping row lands here; future §8.X follow-ons (per the §8.X subsection template proposed in 0026) add their own per-provider tool_choice mapping rows.
Framework does NOT enforce "none" post-hoc. Per §5's clarifying paragraph, if a provider returns tool calls despite tool_choice="none", the implementation MUST surface what the provider returned without re-validating. Provider compliance is observable from finish_reason / tool_calls but is not framework-policed.

[0.19.0] — 2026-05-24¶

Changed

graph-engine §6 drain operation gained an optional timeout parameter and now MUST return a summary. Drain returns once all observer events deliver OR once the caller-supplied timeout elapses, whichever happens first. When the timeout fires, workers MUST be cancelled or otherwise terminated such that the compiled graph remains usable for subsequent invocations — partial delivery state from one drain MUST NOT leak into the next invocation. The summary MUST include at minimum undelivered_count (the count of events still queued or in-flight when the timeout fired) and timeout_reached (a boolean flag). Implementations MAY provide richer detail (per-observer counts, sampled event metadata). When called without a timeout, drain still waits indefinitely (the existing v0.3.0 behavior) and the summary's undelivered_count is 0, timeout_reached is false — callers receive a consistent shape regardless of whether they supplied a timeout. (proposal 0010)
Conformance fixtures 022-drain-timeout-elapses-with-undelivered, 023-drain-timeout-not-reached-fast-observers, 024-drain-timeout-clean-state-for-next-invocation, 025-drain-no-timeout-waits-for-all (graph-engine). New harness primitives: observers[].sleep_ms_per_event (uniform or {first_invocation, subsequent_invocations} form), invoke.drain.timeout_seconds, expected.drain_summary.{timeout_reached, undelivered_count, undelivered_count_min}, multi-invocation invocations: block for cross-drain state-isolation testing, and invariants drain_returned_within_timeout / graph_state_intact_after_timeout / second_invocation_drain_independent_of_first / drain_waited_for_all_events.

Notes

Pre-1.0 MINOR bump. Implementations passing the v0.18.0 fixtures need actual work to pass the new drain fixtures (return a summary, accept a timeout, cancel cleanly under timeout, preserve graph state across cross-drain boundaries). The no-timeout drain path is backward-compatible — existing callers passing no timeout continue to get "wait until everything delivers" — but the return type now carries a summary where v0.3.0 returned nothing. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.18.0 may target v0.19.0 directly.
Cancellation mechanism is implementation-defined. When the timeout elapses while an observer is mid-call, the implementation MUST terminate the call in time to honor the deadline. How it does so — task.cancel() in Python, an AbortSignal in TypeScript, refusing to hand the worker the next event once the deadline is within an observer's expected latency budget — is implementation-defined and SHOULD be documented per-impl. The hard deadline itself is not negotiable. Observers SHOULD be written to be cancellation-safe (idempotent writes, try/finally cleanup).
Summary shape is language-idiomatic. The two required fields (undelivered_count, timeout_reached) are mandated; the shape that carries them (a Python dict/dataclass, a TypeScript object, etc.) is per-language ergonomics. Implementations MAY add additional fields (per-observer counts, sampled event metadata) as long as the minimum two are present.
Downstream interactions (informative — no normative changes to other capabilities): under a timeout, late observer events may be lost. The OTel observer (observability §6) may have openarmature spans that never reach the exporter; downstream OTel exporters' own buffer/retry settings cover this. Checkpoint save events (pipeline-utilities §10.8) may not surface as observer-stream spans under timeout, but the underlying checkpoint save was synchronous and durable per §10.3 / §10.1.1 — resume correctness is unaffected. Event production remains deterministic (graph-engine §5); only event delivery is bounded by the timeout.

[0.18.1] — 2026-05-25¶

Fixed

pipeline-utilities conformance fixture 052-checkpoint-fan-out-collect-errors-resume — expected.final_state.results literal corrected from [10, 20, 30, 40] to [10, 20, 40, 50]. The fixture exercises a 5-instance collect-mode fan-out (items [10, 20, 30, 40, 50]) with instance 2 (item value 30) configured to always fail; failures under collect route to errors_field and never to the success target_field. The original literal listed the first four input items rather than the four values from the four successful instances (0, 1, 3, 4), contradicting the fixture's own description ("4 success contributions in results") and its other assertions (errors_list_length: 1, instances_executed_during_resume: [3, 4], instances_skipped_during_resume: [0, 1, 2]). Surfaced by the openarmature-python proposal 0009 implementation pass.

Notes

Pre-1.0 PATCH bump. Fixture-data correction only — no spec text changes, no behavioral changes, no new types, no new error categories. The spec contract defined in v0.18.0 is unchanged; the fixture literal now matches what the fixture's own data trace produces. An implementation that passed the fixture's documented intent under v0.18.0 (i.e., produced [10, 20, 40, 50] per the description and other assertions) passes the corrected fixture unchanged; an implementation that had matched the wrong literal would have had to produce output inconsistent with the fixture's data shape. No proposal required per GOVERNANCE.md (typo fix).
Released as a v0.18.x maintenance tag. Tagged on a maintenance branch off v0.18.0 (rather than on main, which had since stacked v0.19.0 / v0.20.0 / v0.20.1) so the python implementation can bump its spec submodule pin from v0.18.0 to v0.18.1 cleanly without absorbing the unrelated v0.19.0+ fixture sets it has not yet implemented. The same fixture correction is also reflected on main for forward consistency.
Skip-ahead implementation. Per the Skip-ahead implementation governance principle, implementations that have not yet shipped against v0.18.0 may target v0.18.1 directly.

[0.18.0] — 2026-05-24¶

Added

pipeline-utilities §10.11 — per-instance fan-out resume contract. Defines fan_out_progress field semantics (per-fan-out-node mapping with per-instance status, result field carrying the durable accumulator contribution, completed_inner_positions for in_flight capture). The completed state is a correctness guarantee that exactly one accumulator entry per instance heads into the fan-in step. Sub-sections cover reducer interaction (§10.11.1, with append being the load-bearing correctness case), error_policy composition (§10.11.2 fail_fast and collect modes), instance_middleware composition (§10.11.3, retry budget resets on resume), and configurable Checkpointer-level batching for fan-out internal saves (§10.11.4, with explicit cost trade-off — buffered-but-unflushed saves lost on crash are acceptable because re-execution under §10.11.1's rules contributes for the first time, not as a double-merge). (proposal 0009)
Conformance fixtures 048-checkpoint-fan-out-per-instance-resume-skips-completed through 054-checkpoint-fan-out-batching-buffered-saves-lost-on-crash (pipeline-utilities). New harness primitives: fan_out_progress matchers under saved_record_assertions (with state, result, completed_inner_positions, state_one_of for execution-mode variation), instances_executed_during_resume / instances_skipped_during_resume resume assertions, instance_N_attempt_index_on_resume per-instance attempt assertions, abort_after_instance fan-out abort directive, and a batched-Checkpointer primitive (kind: in_memory_batched with fan_out_internal_save_batching.flush_every).

Changed

pipeline-utilities §10.7 — fan-out resume contract replaced from atomic-restart with per-instance. When a fan-out is in flight at crash time, resume re-runs only the instances that did not complete-and-record their contribution. Completed instances are skipped; their accumulator entries (fan_out_progress[].instances[].result) roll forward to the fan-in step (per §9.3) unchanged. The atomic-restart behavior from v1 (a crash mid-fan-out re-running the entire fan-out) is superseded. (proposal 0009)
pipeline-utilities §10.3 — save granularity extended to fan-out instance internal nodes. The engine now fires Checkpointer.save at every completed event from inside a fan-out instance (in addition to outermost-graph nodes, subgraph-internal nodes, and the fan-out node itself). The v1 "engine does NOT save during fan-out instance execution" elision is removed. Fan-out node's own completion save now also finalizes fan_out_progress to mark all instances complete. Volume concerns for high-instance-count fan-outs are addressed via the configurable batching knob in §10.11.4 (opt-in, off by default). (proposal 0009)
pipeline-utilities §10.2 fan_out_progress field — promoted from reserved to populated. The v1 placeholder language ("reserved field for the v2 per-instance fan-out resume follow-on proposal") is replaced; the field now carries per-fan-out-node entries when one or more fan-outs are in flight at save time, per §10.11. The field shape is fully specified in §10.11. (proposal 0009)
pipeline-utilities — existing §10.11 "Reference implementations and backend layering" renumbered to §10.13 to accommodate the new §10.11. Cross-reference in §10.12.1 (the SQLiteCheckpointer reference implementation (per §10.11) mention) updated to §10.13. (proposal 0009)

Removed

Conformance fixture 028-checkpoint-fan-out-atomic-restart (pipeline-utilities). The v1 atomic-restart contract it verified no longer applies under the per-instance resume model. Replaced by fixtures 048–054. (proposal 0009)

Notes

Pre-1.0 MINOR bump. The fan-out resume contract changes (atomic → per-instance) and the engine's save granularity changes (now saves inside fan-out instances) are implementation-visible: a v1-compliant implementation that does atomic restart fails the new per-instance fixtures. Matches the v0.16.0 precedent for behavioral category changes being MINOR pre-1.0. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.17.1 may target v0.18.0 directly.
Batching default is off. §10.11.4 configurable batching for fan-out internal saves is opt-in per Checkpointer instance. The default behavior is "every fan-out internal save is synchronously durable" — the simpler correctness story. Backends document their batching defaults and configuration shape; users opt in with eyes open.
completed_inner_positions is observational, not state-restore. §10.11's per-instance completed_inner_positions field captures how far an in_flight instance had progressed within its inner subgraph at save time. On resume, the instance re-enters its inner subgraph at the declared entry node; the completed_inner_positions field does NOT serve as a per-inner-node resume point. This is the deliberate scope cut: per-instance resume treats the instance as an atomic unit, not as a re-entry point for inner nodes. Per-inner-node resume inside a fan-out instance would require a different contract and significantly complicate §10.11.1's reducer-interaction story.
Parallel branches (§11) atomic-restart unchanged. The §11.9 composition-with-checkpointing note has been tightened to remove the "deferred alongside per-instance fan-out resume" framing — per-branch resume is its own follow-on and inherits whatever lessons fall out of the per-instance fan-out work.

[0.17.1] — 2026-05-24¶

Changed

llm-provider §8 reframed from "OpenAI-compatible wire format" to "Wire-format mappings". The existing OpenAI-compatible body is now nested under §8.1 "OpenAI-compatible mapping"; its subsections renumber §8.1 (Request mapping) → §8.1.1, §8.2 (Response mapping) → §8.1.2, §8.3 (Error mapping) → §8.1.3, §8.4 (Concurrency) → §8.1.4, §8.5 (Structured output) → §8.1.5, with the deeper §8.1.1 (Content-block wire mapping) → §8.1.1.1, §8.5.1 (Fallback) → §8.1.5.1, §8.5.2 (Response mapping) → §8.1.5.2. A new §8 framing paragraph catalogs the wire-format mapping section as the home for cross-language provider mappings, establishes the default placement rule (any mapping intended for implementation across multiple OA language implementations MUST land in §8.X), reserves out-of-tree for genuinely single-language / opt-out / experimental cases, and carries over the "compliance label" opt-in. (proposal 0019)
Conformance-fixture sidecars under spec/llm-provider/conformance/ updated to reference the new section numbers (§8.1.1, §8.1.2, §8.1.3, §8.1.5, §8.1.5.1, §8.1.1.1). Fixture YAML and behavior are unchanged.

Notes

Pre-1.0 PATCH bump. Purely textual reframing — no new types, no new error categories, no behavioral change. All v0.17.0 conformance fixtures pass under the renumbered structure without modification. Matches the v0.16.1 precedent (spec-text clarification with no fixture changes). The §3 / §4 / §5 / §6 / §7 contract remains the normative cross-provider surface; §8 is reorganized as a catalog of concrete mappings.
Per-mapping subsection structure is not normatively prescribed. §8.1 (the OpenAI-compatible mapping) uses Request / Response / Error / Concurrency / Structured-output subsections; follow-on proposals adding §8.2+ (Anthropic Messages, Google Gemini, Mistral, …) MAY mirror this structure or diverge per provider. The first follow-on may establish a recommended template if reviewer signal warrants.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.17.0 may target v0.17.1 directly.

[0.17.0] — 2026-05-22¶

Added

observability §5.5 expanded with LLM input/output payload attributes (default-off). New openarmature.llm.input.messages (JSON-encoded §3 message list), openarmature.llm.output.content (assistant content verbatim), and openarmature.llm.request.extras (RuntimeConfig extras JSON-encoded). Gated by a new observer-level disable_llm_payload: bool = True flag — default-off for privacy and storage-cost safety; users wanting LLM-aware backend (Langfuse, Phoenix, Honeycomb LLM lens) message rendering flip the flag once at integration. (proposal 0024)
observability §5.5.2 — RuntimeConfig request parameters emitted under the OpenTelemetry GenAI semantic conventions (gen_ai.request.temperature, gen_ai.request.max_tokens, gen_ai.request.top_p, gen_ai.request.seed). Direct emission under the GenAI namespace (no OA-prefixed parallels) because these cross-vendor LLM parameters have no OpenArmature-specific semantics. Establishes a precedent for future spec touchpoints: OA-prefix for OA-specific state; GenAI semconv for cross-vendor LLM parameters and response metadata when the semconv name is stable. Absence of an attribute means "the field was not supplied," distinct from "supplied with a zero value." (proposal 0024)
observability §5.5.3 — GenAI semconv response attributes (gen_ai.system, gen_ai.request.model, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reasons as one-element array, gen_ai.response.id). Emitted by default so LLM-aware OTel backends render generations correctly out of the box without per-user attribute-mapping shims. The OpenAI-compatible provider defaults gen_ai.system to "openai"; callers using the provider with a non-OpenAI endpoint (vLLM, LM Studio, llama.cpp) MUST be able to override per provider instance. Suppressible via a new disable_genai_semconv: bool = False flag. (proposal 0024)
observability §5.5.5 — truncation contract for the §5.5.1 payload attributes. Default 64 KiB per-attribute cap, configurable per observer with a 256-byte minimum. Five-step truncation algorithm: compute the marker, compute target prefix size N = cap - L_marker, backtrack from N to the nearest UTF-8 code-point boundary (preventing split multi-byte sequences for CJK / emoji / combining marks), emit prefix + marker. Marker is the literal suffix …[truncated, M bytes total] appended outside any JSON encoding so backends get a clean truncation signal without a flag attribute. Image content blocks with inline base64 sources MUST be replaced with a redacted placeholder ({type: "image", source: {type: "inline_redacted", byte_count}, media_type, detail?}) before JSON encoding — media_type and detail stay at the image-block level per llm-provider §3.1.2; inline image bytes MUST NOT appear on the span under any configuration. (proposal 0024)
observability §5.5.6 — cross-implementation consistency rules for §5.5.1 through §5.5.5. Implementations MUST agree on attribute names, value types, JSON serialization shape (sorted keys, UTF-8, no insignificant whitespace, within-implementation determinism), truncation marker string, inline-image placeholder shape, and the three opt-out flag defaults. Cross-implementation bytewise stability is NOT mandated — JSON encoding rules vary across language standard libraries; conformance fixtures assert parse-shape equivalence rather than bytewise equality. A follow-on MAY adopt a canonical JSON scheme (e.g., RFC 8785 JCS) if cross-impl bytewise stability becomes load-bearing. (proposal 0024)
Conformance fixtures 012-otel-llm-payload-default-off through 021-otel-llm-disable-genai-semconv (observability), covering the default-off payload behavior, payload-enabled emission, truncation, image redaction, request-parameter emission (full and partial), RuntimeConfig extras, the GenAI semconv minimum set, gen_ai.system caller-set override, and the disable_genai_semconv opt-out. New harness primitives: disable_llm_payload, disable_genai_semconv, attributes_absent, attribute_parses_as_messages, attribute_parses_as_object, attribute_truncation, attribute_does_not_contain, content_repeat, base64_data_synthetic, provider.genai_system, and a config block under calls_llm (with temperature, max_tokens, top_p, seed, and an extras sub-block for the §6 extra="allow" pass-through fields).

Notes

Pre-1.0 MINOR bump. Additions only — no existing attribute is renamed and no v0.7.0 behavior is removed. Implementations currently passing the v0.16.1 fixtures continue to pass; the new fixtures (012–021) extend the suite with cases for the additions. Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.16.1 may target v0.17.0 directly.
Default-on by feature, default-off by privacy. §5.5.2 request parameters and §5.5.3 response attributes emit by default (disable_genai_semconv = False) — this is the value that makes LLM-aware backend rendering work without per-user shims. §5.5.1 payload attributes (messages, response content, extras) emit only when explicitly opted in (disable_llm_payload = True by default) — protecting users from inadvertent PII leakage and surprise storage costs. A deliberate divergence from the industry-default-on convention (OpenInference, LangSmith, Phoenix all default-on for content emission); the proposal's Alternatives considered records the rationale.

[0.16.1] — 2026-05-16¶

Changed

graph-engine §6 attempt_index description clarified. The original text ("For nodes not wrapped by retry middleware … attempt_index MUST be 0. For nodes wrapped by retry middleware that re-attempts execution, attempt_index increments per attempt…") was ambiguous on whether "wrapped" included transitive wrapping via middleware on a containing subgraph. Tightened to make explicit that attempt_index increments per attempt for nodes wrapped by retry middleware EITHER directly (the node's own per-node middleware chain) OR transitively (via §9.7 instance middleware or §11.7 branch middleware). Fixture 036 (pipeline-utilities/036-parallel-branches-with-branch-middleware-retry) already encoded the transitive-wrapping reading via its alpha_inner_attempt_indices_seen: [0, 1] invariant and its companion .md prose; the spec text now matches what the fixture has required since v0.11.0.
pipeline-utilities §5 attempt-index paragraph clarified. Parallel tightening to the graph-engine §6 change. Also notes that the propagation mechanism is implementation-defined (Python contextvars.ContextVar set by the retry middleware before each next call, TypeScript AsyncLocalStorage or equivalent) so the retry middleware can publish its current attempt counter to events emitted from inner nodes of any subgraph the retry re-invokes. A cross-reference to graph-engine §6's nested-retry precedence rule (innermost-wins) is added at the end of the paragraph.

Notes

Pre-1.0 PATCH bump. Spec-text clarification to match existing fixture behavior. Implementations that already passed fixture 036 (alpha_inner_attempt_indices_seen: [0, 1]) under v0.11.0 see no behavior change. Implementations that read the §6 text as direct-wrapping-only — and therefore would have failed fixture 036 — need to add transitive propagation of the retry's attempt counter through the wrapping chain. The spec text now explicitly mandates what the fixture already required.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.16.0 may target v0.16.1 directly without implementing v0.16.0 first.

[0.16.0] — 2026-05-15¶

Added

pipeline-utilities §10.10 — new canonical configuration-time category checkpoint_state_migration_chain_ambiguous. Raised when the registered migration set contains an ambiguity that prevents the engine from picking a unique chain. Two cases trigger the category: a duplicate (from_version, to_version) pair at registration (per §10.12.1) and multiple distinct shortest paths between a source / target version pair at chain resolution (per §10.12.2). Non-transient. Mutually exclusive with the other three migration-related categories (checkpoint_record_invalid, checkpoint_state_migration_missing, checkpoint_state_migration_failed) on any given resume; chain-ambiguous routes first because it fires at build or load time before any migration runs or deserialization is attempted. (proposal 0018)
Conformance fixture 047-state-migration-chain-ambiguous (pipeline-utilities), covering both the duplicate-pair-at-registration case and the ambiguous-shortest-paths-at-resolution case via the new expected_chain_ambiguity_error harness primitive. The primitive accepts the named category surfacing at either build time or during resume, preserving §10.12.2's compile-time-SHOULD / load-time-acceptable carve-out so implementations detecting ambiguity at either point pass the same fixture.

Changed

pipeline-utilities §10.12.1 — duplicate-pair sentence names the category. "MUST raise a configuration-time error (the chain is ambiguous)" → "MUST raise checkpoint_state_migration_chain_ambiguous (per §10.10) at registration or compile time, before any resume attempt." (proposal 0018)
pipeline-utilities §10.12.2 step 2 — multi-shortest-path clause names the category. "MUST raise a configuration-time error — the same category §10.12.1 raises for duplicate (from_version, to_version) pairs" → "MUST raise checkpoint_state_migration_chain_ambiguous (per §10.10)." The "Implementations SHOULD detect ambiguity at compile time when feasible" guidance immediately following remains unchanged. (proposal 0018)
pipeline-utilities §10.10 — mutual-exclusion paragraph rewritten to list all four migration-related categories with the new routing precedence (registry well-formedness → version compatibility → chain application → deserialization). (proposal 0018)

Notes

Pre-1.0 MINOR bump. Although v0.15.0 already mandated "a configuration-time error" for both ambiguity cases, naming a canonical category that didn't exist before is implementation-visible: implementations that previously raised an arbitrary configuration error (a language-native ValueError, a generic Error, etc.) must now surface checkpoint_state_migration_chain_ambiguous to pass fixture 047. Matches the precedent set by proposal 0014's category additions (checkpoint_state_migration_missing / _failed), which shipped as the v0.12.0 MINOR bump. The change is small in scope (rename the category surfaced for one specific case) but is correctly classified MINOR per pre-1.0 SemVer.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.15.0 may target v0.16.0 directly without implementing v0.15.0 first.

[0.15.0] — 2026-05-14¶

Added

New capability: prompt-management. Creates spec/prompt-management/spec.md. Defines the contract by which named, versioned templates are fetched from one or more backends, rendered with caller-supplied variables, and turned into LLM-ready message sequences. Core abstractions: Prompt (unrendered template + identity metadata), PromptResult (rendered output + identity + content hashes), PromptManager (user-facing API; composes backends, fetches, renders), PromptBackend (fetch-only protocol; backends plug in), PromptGroup (tracing-grouping primitive for related prompts, N≥2 members). Specifies fetch/render separability with a convenience get(), strict-undefined-by-default variable handling (§7), composite-backend fallback semantics (§8 — fall back only on infrastructure failure, not on logical absence), three canonical error categories (prompt_not_found, prompt_render_error, prompt_store_unavailable), cross-spec touchpoints to llm-provider §3 (message shape) and observability §5.5 (prompt-identity span attributes including openarmature.prompt.name/version/label/template_hash/rendered_hash/group_name), and a deterministic-render contract (§12). (proposal 0017)
Conformance fixtures 001-fetch-success through 012-prompt-result-rendered-hash-stability (prompt-management), covering local-backend fetch success, prompt-not-found, prompt-store-unavailable, render success, render-undefined-variable, render determinism, composite-manager fallback on infrastructure unavailability, composite-manager NO-fallback on prompt_not_found, composite-manager all-unavailable, the get() convenience equivalence, PromptGroup shape, and within-implementation rendered_hash stability (cross-implementation stability deferred pending a follow-on tightening of the hash algorithm and canonical serialization).

Notes

New capability — no existing-behavior implications. The prompt-management capability is wholly new; no existing capability changes. Implementations MAY adopt it incrementally.
The capability composes with llm-provider and observability via cross-spec touchpoints in §11; it does not modify either of those specs in this version. A follow-on observability proposal MAY tighten the MAY propagation guidance in §11 once cross-implementation propagation mechanisms settle.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.14.0 may target v0.15.0 directly without implementing v0.14.0 first.

[0.14.0] — 2026-05-14¶

Added

llm-provider §5 — response_schema parameter on complete(). Optional JSON Schema describing the expected output shape. When None/absent, the call behaves as in v0.4.0 (free-form text content; no parsed value). When present, the top-level schema MUST be an object schema (type: "object" at the root), matching §4 Tool.parameters and OpenAI's strict-mode wire format. Single-method design — same complete() operation handles both free-form and structured-output calls; the response carries a new parsed field when applicable. (proposal 0016)
llm-provider §6 — parsed field on Response. Holds the parsed-and-validated structured value when the call supplied a response_schema and the model returned structured content. Absent on free-form calls and on finish_reason: "tool_calls" responses (regardless of whether message.content is also populated, per the §3 assistant-message contract). message.content carries the provider's content string preserved verbatim — implementations MUST NOT re-serialize parsed back into message.content. (proposal 0016)
llm-provider §7 — new error category structured_output_invalid. Raised when complete() was called with a response_schema and the provider returned content that could not be parsed as JSON OR did not validate against the schema. The error MUST expose the requested schema, the raw response content, and a description of the parse/validation failure. Non-transient by default — a model that fails schema compliance on a given prompt usually fails the same way on retry; users wanting retry semantics MAY include the category in a RetryMiddleware classifier's transient set. Distinct from provider_invalid_response (which covers wire-shape malformation, not content validation against the caller's schema). (proposal 0016)
llm-provider §8.5 Structured output wire mapping. OpenAI request body includes a response_format: { type: "json_schema", json_schema: { name, schema, strict } } field when response_schema is supplied. strict: true enables OpenAI's schema-constrained decoding when the schema satisfies strict-mode constraints; implementations SHOULD fall back to strict: false otherwise. §8.5.1 specifies a prompt-augmentation fallback for providers without native response_format support (construct a modified copy of the message list with a JSON-only directive — caller's messages MUST NOT be mutated). §8.5.2 documents the response mapping (message.content verbatim; parsed is its deserialization against response_schema). (proposal 0016)
Conformance fixtures 021-structured-output-success through 028-structured-output-no-schema-regression (llm-provider), covering happy-path success, JSON-parse failure routing, schema-validation failure routing, non-transient retry classification, tool-calls path with schema set (parsed absent), native wire-format mapping, prompt-augmentation fallback path, and the no-schema regression (v0.4.0 behavior preserved when response_schema is absent).

Changed

llm-provider §10 Out of scope — structured output deferral removed. The single "Structured output — JSON mode, schema-constrained decoding, response_format" entry is removed; §5/§6/§7/§8.5 collectively cover the capability. Other §10 entries (streaming, audio/video, token counting, provider-native wire formats, agent loop, retry/rate-limit, prompt template rendering, embeddings) unchanged. (proposal 0016)

Notes

Additive change to complete() signature and Response shape (pre-1.0 MINOR). Existing callers that don't supply response_schema see no behavior change — the parsed field is absent on free-form responses, and the wire body MUST NOT include response_format. The new structured-output path is fully opt-in.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.13.0 may target v0.14.0 directly without implementing v0.13.0 first.

[0.13.0] — 2026-05-14¶

Added

llm-provider §3.1 Content blocks. New subsection defining text and image blocks for use in user-message content. Text blocks carry a single text string; image blocks carry a source (url or inline base64), a conditional media_type (required for inline sources, ignored for URL sources; required to be one of image/png, image/jpeg, image/webp at minimum), and an optional detail hint ("auto" / "low" / "high"). A user message MAY mix text and image blocks freely; block order is preserved through the wire. v1 scope: image input on user messages only — assistant-output images, audio, and video remain deferred. (proposal 0015)
llm-provider §7 — new error category provider_unsupported_content_block. Raised when the bound model does not support a content block type used in the request (e.g., text-only model received an image block, or media_type/source variant unsupported). Pre-send validation or post-receive mapping; non-transient. (proposal 0015)
llm-provider §8.1.1 Content-block wire mapping. Each spec content block maps to one OpenAI content-array entry: TextBlock → { "type": "text", ... }; ImageBlock with URL source → { "type": "image_url", "image_url": { "url": ... } }; ImageBlock with inline source → { "type": "image_url", "image_url": { "url": "data:<media_type>;base64,<base64_data>" } } per RFC 2397. The detail hint maps to image_url.detail. Empty blocks rejected pre-send via provider_invalid_request. (proposal 0015)
Conformance fixtures 009-content-blocks-text-only-equivalence through 020-content-blocks-inline-image-missing-media-type (llm-provider), covering text-only equivalence with the string form, URL-image and inline-base64 image mapping, the detail hint, mixed-order preservation, empty-sequence and empty-text-block validation, image-block-missing-source structural rejection, invalid detail-value enum rejection, inline-image-missing-media-type rejection, unsupported-by-model error routing, and the user-only restriction.

Changed

llm-provider §3 Message shape — user-role content constraint. content on user messages MAY be either a non-empty string (the v1 form) OR a non-empty ordered sequence of content blocks per §3.1. All other roles remain text-string-only in this version. (proposal 0015)
llm-provider §8.1 Request mapping — user row. Updated to reflect the dual-shape input: string content maps directly to OpenAI's content string; content-block sequence maps to OpenAI's content-array form per §8.1.1. (proposal 0015)
llm-provider §10 Out of scope — multi-modal entry split. The single "multi-modal content (image, audio, video inputs and outputs)" entry split into two: "Multi-modal audio and video" (audio and video each warrant their own proposal — formats, codecs, wire mappings differ enough) and "Image outputs" (assistant-message-borne images; v1 image support is user-input-only). Image inputs are now covered by §3.1. (proposal 0015)

Notes

Additive change to §3 user-message content shape (pre-1.0 MINOR). Existing callers that pass content as a string continue to work unchanged; the new content-block sequence form is opt-in. Implementations that previously rejected non-string content via provider_invalid_request now accept the content-block sequence form when the message is a user message — an observable behavior change for that specific case, classified pre-1.0 MINOR.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.12.0 may target v0.13.0 directly without implementing v0.12.0 first.

[0.12.0] — 2026-05-14¶

Added

pipeline-utilities §10.12 State migrations. Activates the schema_version field that proposal 0008 reserved on CheckpointRecord and adds a registration surface for user-supplied transformations that run on checkpoint load when the stored record's schema_version does not match the current state schema's version. Specifies migration registration (§10.12.1, including backend-constraint requirements for class-bound serialization formats and the configuration-time-error rejection of duplicate (from_version, to_version) pairs), chain resolution (§10.12.2, including migration-function-failure handling), no-op fast path on matching versions (§10.12.3), and composition with checkpoint_record_invalid (§10.12.4). (proposal 0014)
pipeline-utilities §10.10 — two new error categories. checkpoint_state_migration_missing (raised on version mismatch when no migration chain connects stored to current; non-transient; carries the registered migration set in the error description) and checkpoint_state_migration_failed (raised when a registered migration function itself raises; non-transient; preserves the underlying exception as cause). The three migration-related categories (checkpoint_record_invalid, ..._missing, ..._failed) are mutually exclusive on any given resume per the §10.10 ordering. (proposal 0014)
Conformance fixtures 039-state-migration-additive-field through 046-state-migration-function-raises (pipeline-utilities), covering additive-field migration, chain application, missing/no-path registry, no-op when versions match, parent-state migration, post-migration deserialization failure routing to checkpoint_record_invalid, and migration-function-raise routing to checkpoint_state_migration_failed.

Changed

pipeline-utilities §10.2 schema_version description. Reframed as a user-facing identifier carried on the user's state schema, not an implementation-internal backend version. State classes that do not declare a schema_version carry an implementation-defined sentinel and are not migration-eligible. Users intending to evolve their schema across deploys MUST declare an explicit identifier so migrations can register against it. (proposal 0014)
pipeline-utilities §10.10 checkpoint_record_invalid description. Removed "incompatible schema_version" from the list of structural-failure reasons; raw schema_version mismatches now route through the migration system per §10.12. Added "post-migration state that fails to deserialize against the current state class per §10.12.4" as a covered case. The category remains non-transient. (proposal 0014)

Notes

Additive change to §10.10's category list (pre-1.0 MINOR). Resumes where the stored and current schema_version match see no behavior change. Resumes with a version mismatch observe the new routing: implementations that previously raised checkpoint_record_invalid on raw schema_version mismatch now route through checkpoint_state_migration_missing (when no migration chain connects), checkpoint_state_migration_failed (when a registered migration raises), or checkpoint_record_invalid (when the backend cannot support migration per §10.12.1). An observable behavior change for the version-mismatch case, classified pre-1.0 MINOR.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.11.0 may target v0.12.0 directly without implementing v0.11.0 first.

[0.11.0] — 2026-05-13¶

Added

pipeline-utilities §11 Parallel branches. A topology-driven concurrency primitive: a parallel-branches node dispatches M heterogeneous compiled subgraphs concurrently within a single parent invocation. Each branch is a separately compiled subgraph with potentially different state schema, different middleware, and different topology; per-branch projection in (inputs) and out (outputs) lets each branch read and write parent-state fields. Complements the §9 fan-out primitive (data-driven, N instances of one subgraph). Specifies configuration (§11.1, §11.1.1), per-branch projection (§11.2, §11.4), concurrent execution (§11.3), error policy (§11.5), composition with parent and per-branch middleware (§11.6, §11.7), determinism (§11.8), and the new error categories parallel_branches_no_branches (compile-time) and parallel_branches_branch_failed (runtime, non-transient) (§11.9). (proposal 0011)
graph-engine §3 Execution model — concurrency exception extended to parallel branches. The single-threaded execution rule now carves out two bounded exceptions: fan-out (§9) and parallel-branches (§11). Both may execute multiple subgraphs concurrently; single-threaded execution resumes for the parent run after the concurrent node completes. (proposal 0011)
graph-engine §6 Observer hooks — branch_name field on NodeEvent. Optional non-empty string, populated only on events from nodes inside a parallel-branches branch. Carries the branch's name as declared in the parallel-branches node's branches mapping. The event-source uniqueness invariant is extended to include branch_name: the combination of namespace, branch_name, fan_out_index, attempt_index, and phase uniquely identifies an event source. branch_name and fan_out_index are independent and MAY both be present simultaneously when a fan-out node executes inside a parallel-branches branch (or vice versa). (proposal 0011)
Conformance fixtures 032-parallel-branches-basic through 038-parallel-branches-compose-with-fan-out (pipeline-utilities) and 021-observer-branch-name (graph-engine).

Notes

Additive change to the §6 NodeEvent shape (pre-1.0 MINOR). Existing observers that ignore the new branch_name field continue to function unchanged; the field is absent on events from nodes not inside any parallel-branches branch. The change is backwards-compatible at the struct level.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.10.0 may target v0.11.0 directly without implementing v0.10.0 first.

[0.10.0] — 2026-05-09¶

Added

graph-engine §6 Observer hooks — fan_out_config field on NodeEvent. Optional structured value populated only on a fan-out node's own started and completed events. Carries the resolved values for the four observability §5.4 fan-out attributes: item_count (non-negative int), concurrency (positive int or null; null = unbounded, matching pipeline-utilities §9.2's resolved type), error_policy ("fail_fast" or "collect"), parent_node_name (string, equal to the event's node_name). Absent on all other events. When fan_out_config is populated, all four keys are always present (observers can rely on key presence); only concurrency is nullable, with the other three keys always non-null. The field is the canonical surfacing mechanism — observers source the §5.4 attributes from event.fan_out_config rather than from any implementation-private mechanism. The 0 sentinel in observability §5.4's openarmature.fan_out.concurrency OTel attribute is an attribute-mapping pragmatism (OTel primitives can't carry null) and does not appear on the canonical field. (proposal 0013)
observability §5.4 Fan-out span attributes — editorial cross-reference paragraph. Specifies how the existing §5.4 attributes are sourced from the new graph-engine §6 fan_out_config field, preserving §5.4's two-span-category distinction: item_count/concurrency/error_policy go on the fan-out node span and source from fan_out_config on the fan-out node's events; parent_node_name goes on per-instance instance spans (also surfaced via fan_out_config on the fan-out node's started event but cached by the observer and applied when synthesizing per-instance spans, since per-instance events don't carry fan_out_config); fan_out_index continues to source from event.fan_out_index on inner-node events. The paragraph also notes that §4's per-instance fan-out instance span layout applies regardless of detached mode (already true in §4's prose; the cross-reference makes it explicit for §5.4 readers). No new normative behavior in §5.4.

Notes

Additive change to the §6 NodeEvent shape (pre-1.0 MINOR). Existing observers that ignore the new field continue to function unchanged; the field is null on non-fan-out events. The change is backwards-compatible at the struct level.
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.9.0 may target v0.10.0 directly without implementing v0.9.0 first.
No new conformance fixtures. Conformance fixture observability/006-otel-fan-out-instance-attribution already exercises both pieces of the change — the four fan-out node-span attributes (now sourced from fan_out_config) and the per-instance subgraph span layout (already required by §4).
Cross-spec impact verified: pipeline-utilities §9 fan-out node configuration unchanged (the new field is sourced from the existing config; no shape change at the configuration boundary). observability §4 per-instance layout requirement unchanged (this proposal cross-references it without altering it). llm-provider §1-§9 untouched.
Surfaced during Phase 6.1 PR-C.2 scoping in openarmature-python — the initially recommended implementation-private ContextVar pattern (per coordination thread phase-6-1-pr-c-conformance-fixtures round 06) does not survive the observer's worker-task boundary because async-runtime context-copy semantics freeze the worker's context at task creation. ContextVar mutations on the engine side after worker creation are invisible to the worker. The data must flow through the canonical event payload to cross the queue. Three alternatives considered (ContextVar, typed pre_state subclass, sidecar extra mapping); fan_out_config field on canonical NodeEvent chosen for typed, language-portable surfacing.

[0.9.0] — 2026-05-09¶

Changed

graph-engine §3 Execution model — completed event fires after edge evaluation (BREAKING, but pre-1.0). Step 3 of the execution loop is amended: the completed observer event MUST be dispatched after the merge in step 2 AND the edge evaluation in step 4 both complete, rather than between them. The dispatched event captures the node's complete transition: body execution, reducer merge, and outgoing edge resolution. The failure list in step 3 extends to include routing_error (no matching edge) and edge_exception (edge function raised) — both now populate the error field of the preceding node's completed event rather than propagating without an event. (proposal 0012)
graph-engine §6 Observer hooks — routing_error and edge_exception share the preceding node's event pair (BREAKING, but pre-1.0). Replaces the v0.6.0 wording "routing_error does NOT produce its own node event pair" with a uniform "edge-resolution failures land on the preceding node's completed event with error populated; observer applies its standard §4.2 status-mapping path." All five §4 runtime error categories now land via the same mechanism. No new event flow; no implementation-side post-end span mutation; no observer code path additions for edge-resolution errors.

Added

Conformance fixture 020-observer-edge-error-events (graph-engine). Two sub-cases — routing_error_lands_on_preceding_node_completed, edge_exception_lands_on_preceding_node_completed — verify that edge-resolution failures share the preceding node's started/completed pair with error populated, the downstream node never runs, and the error category on the completed event matches the §4 category propagated to the invoke() caller.

Notes

Breaking change to v0.6.0+ §6 event-shape contract permitted by pre-1.0 SemVer (per GOVERNANCE.md). Same shape as v0.6.0's pair-model breaking bump (also pre-1.0 MINOR).
Per the "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.8.2 may target v0.9.0 directly without implementing the v0.8.2 ordering first. openarmature-python's Phase 6.1 PR-C.1 is the canonical first implementation of this contract.
Cross-spec impact verified: observability §4.2 status mapping picks up routing_error and edge_exception automatically (the existing error-populated completed-event handler covers them). No changes required in observability §4.2/§5/§6, pipeline-utilities §6/§9/§10, or llm-provider §1-§9.
Surfaced during Phase 6.1 PR-C scoping in openarmature-python — conformance fixture observability/004-otel-routing-error-attribution could not drive cleanly under the v0.8.2 §3/§6 ordering. Two paths considered (sentinel routing_error event vs. ordering swap); swap was chosen for uniform §4 category treatment and to avoid implementation-defined post-end span mutation.

[0.8.2] — 2026-05-06¶

Fixed

Conformance fixture 029-checkpoint-subgraph-resume (pipeline-utilities) used namespace: ["inner"] (the subgraph's name) in its expected completed_positions entry where it should have used namespace: ["dispatch"] (the wrapper node's name in the parent graph). Per graph-engine §6 and the convention established by fixture 013-observer-subgraph-namespacing-and-ordering, namespace is the chain of containing-graph node names, not subgraph names. NodePosition.namespace excludes the node's own name, so for step_one inside subgraph "inner" dispatched by outer node "dispatch", the saved position carries namespace: ["dispatch"]. Bug introduced when fixture 029 was first written; caught during Phase 5 (checkpointing) implementation in openarmature-python — the engine implementation correctly follows §6's convention; the fixture was inconsistent. Without this fix, fixture 029 would reject any conformant implementation. Fixture-only correction; no spec text or contract changes. (PR #30)

[0.8.1] — 2026-05-05¶

Added

Conformance fixture 019-subgraph-two-level-nesting (graph-engine). Regression coverage at depth 3 — existing subgraph fixtures (006, 011, 013) only exercised depth 1, leaving the §6 len(parent_states) == len(namespace) - 1 invariant and the §2 default-projection chain untested at namespace length 3 / parent_states length 2. First graph-engine fixture using the plural subgraphs: form (already in use in observability and pipeline-utilities). No spec text or contract changes. (PR #28)

[0.8.0] — 2026-05-04¶

Added

pipeline-utilities §10 Checkpointing (created). A normative Checkpointer protocol — save / load / list / delete keyed by invocation_id — that lets a graph invocation persist state at well-defined save points and resume from a prior invocation_id without restarting from scratch. The protocol is backend-agnostic: §10 defines the contract; reference implementations (InMemoryCheckpointer, SQLiteCheckpointer) ship in core; durable-execution adapters (Temporal, DBOS, Restate, Redis) plug in as sibling packages. The engine fires a save at every graph-engine §6 completed event for outermost-graph nodes, subgraph-internal nodes, and the fan-out node itself (when the fan-out has fully completed). Fan-out instance internals do NOT save in v1, since v1 fan-out resume is atomic-restart and saving inner-instance state the engine cannot resume from would be dead weight. (proposal 0008)
§10.1.1 Registration and default behavior. Checkpointing is opt-in via Checkpointer registration at graph build time. Without a registered Checkpointer the engine never calls save() and invoke(resume_invocation=...) raises checkpoint_not_found. Mirrors the §6 observer-registration pattern; matches OA's broader "contract is normative; activation is an explicit choice" pattern.
§10.4 Resume model. invoke(resume_invocation=invocation_id) loads the prior record, restores state, mints a new invocation_id for the resumed run, preserves the original correlation_id as the cross-attempt join key, and resumes from the first node in graph topological order whose position is not in completed_positions. Subgraph re-entry uses parent_states. State-restore (not event-replay) — sufficient because graph-engine §5's determinism contract makes state at any boundary equivalent to "all prior nodes' merged contributions."
§10.5 Idempotency contract. Nodes MUST be idempotent under re-execution; mid-node crashes restart the node from its entry on resume. Three explicit escape hatches for nodes that cannot be made idempotent: application-level idempotency (idempotency keys, conditional writes — recommended); a sentinel-based skip middleware on top of pipeline-utilities §6; or skip checkpoint registration entirely.
§10.6 Retry on resume. attempt_index resets to 0 on resume; retry budgets restart fresh. Consistent with "resume is a new execution attempt" framing (§10.4 step 4).
§10.7 Fan-out resume — atomic in v1. A crash mid-fan-out causes the entire fan-out to re-run on resume. Couples directly to §10.3's "no fan-out internal saves" rule. A follow-on proposal will add per-instance fan-out resume with configurable backend batching for fan-out internal saves.
§10.8 Composition with §6 observer hooks. Checkpointer.save calls SHOULD emit a §6-style observer event so the observability mapping can surface saves as spans (openarmature.checkpoint.save recommended). SHOULD-level to allow high-throughput backends to suppress event emission.
§10.9 Composition with detached trace mode. Detached trace mode (observability §4.4) and checkpoint scope are independent. Detached trace mode is purely about trace UI organization; checkpoint scope is about execution recovery. One invoke() call produces one Checkpointer record set keyed by one invocation_id, regardless of how many detached traces it produced.
§10.10 New canonical runtime error categories. checkpoint_not_found (non-transient — raised when Checkpointer.load returns None); checkpoint_save_failed (engine behavior implementation-defined — transient via middleware OR raise to caller; implementation MUST document its choice); checkpoint_record_invalid (non-transient — raised when a loaded record's schema is incompatible with the current graph).
§10.11 Reference implementations and backend layering. Core ships InMemoryCheckpointer (not durable; tests, short-lived runs) and SQLiteCheckpointer (durable on a single host, WAL-mode, accepts pickleable or JSON-native state). Sibling-package adapters for Temporal, DBOS, Restate, and Redis are informative — not specified normatively.
8 conformance fixtures 024-031: save-on-every-completed-event, resume-from-completed-position, record-shape, attempt-index-resets-on-resume, fan-out-atomic-restart, subgraph-resume, checkpoint-not-found, correlation-id-preserved-across-resume.

[0.7.0] — 2026-04-29¶

Added

observability capability (created). Establishes the observability surface; the first backend mapping is OpenTelemetry. Defines a span hierarchy rooted at an openarmature.invocation span with node, subgraph, fan-out instance, retry attempt, and LLM-provider child spans (§4); span status mapping (§4.2) where engine-raised errors per graph-engine §4 produce ERROR status with exception_recorded; the openarmature.* attribute namespace covering invocation, node, subgraph, fan-out, LLM-provider, and cross-cutting attributes (§5); opt-in detached trace mode per subgraph or per fan-out node (§4.4) for very large fan-outs and long-running subgraphs, where the dispatch span carries an OTel Link to a new trace_id; canonical span-name table (§4.5); a normative §6 TracerProvider isolation rule — openarmature MUST emit through its own private TracerProvider, never the OTel global one, preventing duplicate signals when callers run their own auto-instrumentation; a §5.5 LLM-provider span MUST emit rule with a disable_llm_spans opt-out for callers who prefer external instrumentation; OTel Logs Bridge integration so log records emitted during an invocation carry the active trace_id/span_id (§7); and a §8 determinism contract that asserts deterministic span content (hierarchy, names, attributes minus timing, status) while carving out IDs and timestamps. (proposal 0007)
§3 Cross-backend correlation ID — first-class architectural concept. A per-invocation correlation_id propagated across every backend the implementation emits to: caller-supplied verbatim or auto-generated UUIDv4 when absent; propagated via the language's idiomatic context primitive (Python ContextVar, TypeScript AsyncLocalStorage); reset between invocations; flows unchanged across detached subgraphs/fan-outs (invocation-scoped, not trace-scoped). For the OTel mapping it surfaces as openarmature.correlation_id on every span (§5.6) and every log record (§7); future backend mappings (Langfuse, etc.) follow the same per-backend "correlation ID realization" pattern.
§5.1 openarmature.invocation_id MUST UUIDv4. Framework-generated, canonical 36-character UUIDv4. Distinct from correlation_id: invocation_id ties spans of one invocation together within one backend; correlation_id is the cross-backend join key. Backends MUST NOT conflate them.
Conformance fixture suite 001-011 for observability: basic trace shape, subgraph hierarchy, error status, routing-error attribution to the preceding node span, LLM-provider span nested under the calling node (with disable_llm_spans and external-auto-instrumentation isolation sub-cases), fan-out instance attribution via fan_out_index, retry attempt spans (sibling-level), detached trace mode for both subgraph and fan-out, correlation_id cross-cutting + UUIDv4 + context-reset, log correlation including the detached-trace interaction, and determinism over the deterministic portion of span content.

[0.6.0] — 2026-04-28¶

Added

pipeline-utilities §9 Parallel fan-out (created). A fan_out node type that executes a compiled subgraph (or async callable) once per item in a parent state field, with bounded concurrency, and collects per-instance results back into a parent collection field. Two modes: items_field (data-driven; instance count = len(items_field_value), items projected per-instance via item_field) and count (count-driven; literal int OR callable (state) -> int; no per-item data). Mutually exclusive. Default concurrency: 10 (also int-or-callable). Default error_policy: "fail_fast" (cancel siblings on first failure); alternative "collect" (run all, omit failed slots, record errors in errors_field). New instance_middleware config wraps each instance's invocation as a unit (the seam for whole-instance retry vs. per-inner-node retry). Empty fan-out (items_field == [] or count == 0) raises fan_out_empty by default (on_empty: "raise"); user opts in to silent no-op via on_empty: "noop". Optional count_field writes the resolved instance count to a parent state field for programmatic inspection. New compile error categories fan_out_field_not_list, fan_out_count_mode_ambiguous. New runtime error categories fan_out_invalid_count, fan_out_invalid_concurrency, fan_out_empty (non-transient — does not auto-resolve via retry). (proposal 0005)
graph-engine §3 Execution model — fan-out concurrency exception. Single-threaded execution rule carved out so a fan-out node may execute multiple subgraph instances concurrently. Single-threaded execution resumes for the parent run after the fan-out completes.
graph-engine §6 — fan_out_index field on the node event shape. Optional non-negative integer; populated only on events from nodes inside a fan-out instance. The combination of namespace, fan_out_index, attempt_index, and phase uniquely identifies an event source.
graph-engine §6 — per-observer phase subscription. Optional phases parameter on observer registration. Accepted values: {"started", "completed"} (default), {"completed"} (v0.5.0-style; useful for metrics/log aggregators), {"started"} (useful for stuck-node alerting). Empty phase sets raise at registration. Engine filters delivery; phase filter applies at delivery, not dispatch.
Conformance fixtures for pipeline-utilities 017-023 (fan-out basic, fail-fast, collect, retry-middleware, instance-middleware-retry, count-and-concurrency-modes, empty-input) and for graph-engine 017-018 (fan-out index, phase subscription).

Changed

graph-engine §6 Event dispatch — replaced single-event-per-attempt with started/completed pairs (BREAKING, but pre-1.0). Each node attempt now produces TWO events: a started event before the node executes, and a completed event after the reducer merge (or after a failure is captured). Both events share node_name, namespace, step, attempt_index, fan_out_index, pre_state, parent_states. started events have post_state and error absent; completed events have exactly one of post_state or error populated. Required new phase field on the event shape. The pair model makes span boundaries cleaner for OpenTelemetry mapping and other observability backends; doubled event volume is mitigated by per-observer phase subscription.
graph-engine §6 — removed the v0.5.0 "Middleware-dispatched events" subsection. Under the pair model, the engine instruments at the inner-node-call level: each invocation of the wrapped node function produces a started/completed pair from the engine. Retry middleware no longer dispatches its own events — engine handles per-attempt events naturally. The "Middleware-dispatched events" mechanism added in v0.5.0 is no longer needed and is removed.
pipeline-utilities §6.1 Retry middleware — manual dispatch removed. Pseudocode simplified: no more dispatch_failed_attempt_event(...) calls. Each call to next(state) triggers a fresh started/completed pair from the engine. The "Per-attempt observer events" subsection rewritten to reflect engine-handled events.
pipeline-utilities §8 Out of scope — removed "Parallel fan-out / fan-in" (now in §9).
Existing v0.5.0 conformance fixtures updated for the pair model: graph-engine/conformance/012-016 (5 fixtures) and pipeline-utilities/conformance/011, 015 — every event in expected.observer_events split into a started/completed pair; delivery_order updated to include phase field.

Notes

Breaking change to v0.5.0 §6 contract permitted by pre-1.0 SemVer (per GOVERNANCE.md). Per the new "Skip-ahead implementation" governance principle, implementations that have not yet shipped against v0.5.0 may target v0.6.0 directly without implementing the v0.5.0 contract first.

[0.5.0] — 2026-04-28¶

Added

pipeline-utilities capability (created). Establishes the foundational pipeline-utilities surface. §2 specifies the middleware primitive: an async wrapper around node execution with the shape (state, next) -> partial_update, supporting pre-node and post-node phases, short-circuit, exception recovery, and reentrant next calls. §3 mandates per-node and per-graph registration with per-graph-outside-per-node composition. §4 mandates strict bidirectional subgraph-boundary locality (parent middleware sees the subgraph as a single dispatch; subgraph middleware never sees parent state). §6 specifies two canonical middleware implementations MUST ship: retry (§6.1) with default classifier aligned to llm-provider §7 transient categories, exponential-with-full-jitter backoff, explicit cancellation propagation, and per-attempt observer event dispatch; timing (§6.2) with monotonic-clock duration record, on_complete callback, and per-node node_name capture. (proposal 0004)
New RetryMiddleware.classifier signature (exception, state) -> bool. Default classifier ignores state and matches purely on §7 transient categories; user-supplied classifiers MAY consult pre-merge state for context-dependent retry policies.
Conformance fixture suite 001-016 for pipeline-utilities, exercising basic firing, composition ordering, per-graph-vs-per-node nesting, short-circuit, error propagation, error recovery, retry success/exhaustion/passthrough/determinism, subgraph isolation, timing basic firing/failure path, timing+retry composition, retry per-attempt observer events, and retry state-aware classifier.

Changed

graph-engine §6 Observer hooks — attempt_index field added to node event shape. Non-negative integer, default 0. For nodes wrapped by retry middleware (pipeline-utilities §6.1) that re-attempts execution, attempt_index increments per attempt; combined with node_name and namespace it uniquely identifies events from a retried node. The len(parent_states) == len(namespace) - 1 invariant is unaffected. (proposal 0004)
graph-engine §6 Event dispatch — events fire per attempt, not per node execution. For nodes not wrapped by re-attempting middleware, this is exactly once per node execution (unchanged from v0.4.0). For nodes wrapped by retry middleware, one event fires per attempt: the engine dispatches the final attempt's event; the retry middleware dispatches events for any preceding failed attempts via the new "Middleware-dispatched events" subsection.
graph-engine §6 — new "Middleware-dispatched events" subsection. Middleware MAY dispatch additional node events through the engine's delivery queue. Pipeline-utilities canonical retry middleware MUST do so for non-final attempts. Implementation-defined dispatch mechanism; same delivery-queue rules and observer-error isolation as engine-dispatched events; same §5 determinism contract.
Graph-engine conformance fixture 016-observer-attempt-index-default — verifies the new attempt_index field defaults correctly to 0 for non-retry workflows.

Notes

Open question deferred from proposal 0004: per-conditional-branch middleware. Documented as an Out-of-scope item in pipeline-utilities §8 with workarounds (state markers + per-node middleware).

[0.4.0] — 2026-04-28¶

Added

llm-provider capability (created). Establishes the foundational LLM provider abstraction: typed Message (system/user/assistant/tool), Tool, ToolCall, and Response shapes; stateless async complete() operation; pre-flight ready() check with a strong "next call expected to succeed" contract; seven canonical error categories (provider_authentication, provider_unavailable, provider_invalid_model, provider_model_not_loaded, provider_rate_limit, provider_invalid_response, provider_invalid_request); a normative OpenAI-compatible wire format mapping (§8) covering vLLM, LM Studio, llama.cpp, and the OpenAI hosted API. Charter §3.1 principle 8 ("Transparency over abstraction") is realized by Response.raw (verbatim provider response, always populated) and by surfacing partial/malformed tool calls under finish_reason: "error" for application-level repair. (proposal 0006)
New canonical runtime category provider_model_not_loaded — distinct from provider_invalid_model. The model is configured but not currently serving (local-server warmup pattern); marked transient (retry MAY succeed once loading completes).
Response.raw field — the parsed provider response verbatim, MUST be populated on every successful complete() return. Provider-specific extensions (logprobs, vendor stats) surface here unchanged.
Tool-call id verbatim preservation rule — implementations MUST NOT rewrite or normalize provider-supplied ids. Documents cross-provider id round-tripping behavior for applications behind LLM gateways or routers.
Conformance fixture suite 001-008 for llm-provider, exercising basic completion, tool-call roundtrip with verbatim id preservation, pre-send message validation, error category mapping, OpenAI wire-format mapping with raw passthrough, usage accounting, the strengthened ready() contract, and partial/malformed tool calls under finish_reason: "error".

[0.3.1] — 2026-04-28¶

Fixed

Conformance fixture 013-observer-subgraph-namespacing-and-ordering was syntactically invalid YAML and could not be parsed by spec-conforming loaders (PyYAML, libyaml). The four parent_states: values inside the flow-style event mappings used block-style sub-sequences (- {...}), which YAML 1.2 §8.1.2 forbids inside a flow context. Converted those four sub-sequences to flow style ([{...}]); the parsed semantic content is unchanged. No spec text or fixture expectations changed.

[0.3.0] — 2026-04-27¶

Added

graph-engine §6 Observer hooks (promoted from informative to normative). Compiled graphs MUST expose a way to register observers (graph-attached and invocation-scoped, at minimum). Observers are async, fire-and-forget, and receive node events with node_name, namespace (ordered sequence), step (monotonic across the invocation including subgraph-internal nodes), pre_state, exactly one of post_state or error, and parent_states (ordered sequence of containing-graph state snapshots, outermost first; empty for outermost-graph events; len(parent_states) == len(namespace) - 1). pre_state/post_state carry the node-level state shape — outer state for outermost-graph nodes, subgraph state for inner nodes. Per-invocation delivery is strictly serial across all observers and all events; per-event order is graph-attached outermost→innermost, then invocation-scoped. Observer errors MUST NOT interrupt the graph run, prevent other observers from receiving the same event, or prevent subsequent events from being delivered. Compiled graphs MUST expose a drain operation. (proposal 0003)
graph-engine §3 Execution model — observer dispatch step. Between the reducer merge and the outgoing-edge evaluation, the engine MUST dispatch the node event onto the observer delivery queue. On a failed merge step, the event is dispatched (with error populated) before the failure propagates to the caller.
Conformance fixture 012-observer-basic-firing — linear graph with one graph-attached and one invocation-scoped observer; verifies per-node event firing, monotonic step, single-element namespace, and graph-attached-before-invocation-scoped delivery order.
Conformance fixture 013-observer-subgraph-namespacing-and-ordering — outer + subgraph each with an attached observer; verifies chained namespace, step monotonicity across the subgraph boundary, and outermost-first delivery for subgraph-internal events.
Conformance fixture 014-observer-error-event — failing-node event has error populated and post_state absent; engine still propagates the §4 node_exception to the caller after dispatch.
Conformance fixture 015-observer-error-isolation — first-registered observer raises on every event; verifies the second observer still receives every event, the graph run completes, and the raised exceptions do not propagate to invoke().

[0.2.0] — 2026-04-27¶

Added

graph-engine §2 Subgraph — explicit input/output mapping. A subgraph-as-node MAY declare optional inputs (subgraph field name → parent field name) and/or outputs (parent field name → subgraph field name) mappings. inputs is additive over the §2 default of no projection in; outputs replaces (does not extend) the §2 default of field-name matching for projection out. (proposal 0002)
New canonical compile-error category mapping_references_undeclared_field — added to the §2 Compiled graph mandated identifier list. Compilation MUST fail with this category when an inputs or outputs mapping names a field that is not declared in the relevant state schema.
Conformance fixture 011-subgraph-explicit-mapping — composes the same subgraph at three sites with different mapping configurations (both / inputs-only / outputs-only) and verifies projection-in copies, projection-out replacement vs. fallback, and per-site mapping independence.
Conformance fixture 007-compile-errors adds case mapping_references_undeclared_field.

[0.1.1] — 2026-04-18¶

Changed

graph-engine §2 Subgraph (clarification, non-behavioral). Rewrote the Subgraph section to align with conformance fixture 006-subgraph-composition, which already encoded the intended behavior. The corrected defaults: projection in is off (a subgraph runs from its own schema's field defaults, independent of the parent), and projection out uses field-name matching (subgraph fields whose names match parent fields merge back via the parent's reducers; non-matching subgraph fields are discarded). The previous wording said parent fields were copied into the subgraph's initial state by field-name matching at entry, which contradicted fixture 006. No fixtures change.
proposal 0002 (Draft) — Summary, Motivation, and Detailed design. Reworded so inputs is additive over the clarified "no projection in" default, while outputs continues to replace the default field-name matching for projection out. Added an asymmetry note explaining the design choice; tightened the Precedence rationale to outputs-only.

[0.1.0] — 2026-04-16¶

Added

Initial graph-engine capability: typed state, async nodes, static and conditional edges, reducers (last_write_wins, append, merge), subgraph composition, and the baseline execution model. (proposal 0001)
Conformance fixtures for graph-engine under spec/graph-engine/conformance/ (10 fixture pairs covering linear flow, conditional routing, each reducer, subgraph composition, compile-time errors, routing errors, node exception propagation, and determinism).

Notes

Mandated error-category identifiers (proposal 0001 supplement). §2 fixes the canonical compile-time categories (no_declared_entry, unreachable_node, dangling_edge, multiple_outgoing_edges, conflicting_reducers), and §4 fixes the canonical runtime categories (node_exception, edge_exception, reducer_error, routing_error, state_validation_error). Proposal 0001 described these cases but did not mandate identifier strings. Applied pragmatically during the initial implementation PR since no spec version had been released; from 0.1.0 onward, comparable changes require a follow-on proposal.
Routing error recoverable state (proposal 0001 supplement). §4 now requires that routing errors carry recoverable state, matching the node-exception contract. Proposal 0001 required recoverable state for node exceptions only. Same pragmatic-pre-release rationale as above.
Subgraph projection. Defaults to field-name matching for projection out, as clarified in §2. Alternative projection strategies (e.g., explicit input/output mapping) are deferred to proposal 0002 (Draft).