Observability¶
Canonical behavioral specification for the OpenArmature observability capability.
- Capability: observability
- Introduced: spec version 0.7.0
- History:
- created by proposal 0007
- §5.5 extended with LLM input/output payload attributes (default-off),
RuntimeConfigrequest parameters under the OpenTelemetry GenAI semantic conventions, a minimum set of GenAI semconv response attributes, two new opt-out flags (disable_llm_payload,disable_genai_semconv), and a per-attribute truncation contract (64 KiB default cap, UTF-8-boundary-safe algorithm, 256-byte minimum, inline-image redaction) by proposal 0024
This specification is language-agnostic. Each implementation (Python, TypeScript, …) maps its own idioms
onto the behavioral contract described here. Conformance is verified by the fixtures under conformance/.
Normative keywords (MUST, MUST NOT, SHOULD, MAY) are used per RFC 2119.
This first version of the observability capability defines two foundational concepts (cross-backend correlation ID, OpenTelemetry span and log mapping) and one concrete backend mapping (OTel). Future proposals add other backend mappings (e.g., Langfuse) as additional sections of this same spec.
1. Purpose¶
The observability capability defines normative mappings from OpenArmature's runtime event surface (graph-engine §6 observer events, specifically the v0.6.0 started/completed event pairs) into well-known external observability backends. The substrate is provider-neutral; the capability is where each concrete backend's translation lives.
This first version specifies the OpenTelemetry mapping. Future proposals add other backends (Langfuse, etc.) as sibling sections of this same spec; the OTel mapping serves as the reference shape for cross-backend equivalence.
The capability does NOT introduce new graph-engine primitives. It consumes the existing observer
event stream — started events open spans, completed events close them. An implementation that
emits OTel spans is built on top of §6, not into the engine.
2. Concepts¶
Span. A unit of work in OTel — a logically distinct interval with a name, start/end timestamps, status, attributes, and parent-child relationships. The mapping translates each user-meaningful unit of work in a graph invocation (the invocation itself, each subgraph, each node execution, each fan- out instance) into a span.
Span attributes. Key/value pairs attached to a span. OTel attribute values are restricted to
scalar types (string, int, float, bool) and arrays thereof. The mapping uses dotted-key namespaces
under the prefix openarmature..
Span status. OTel spans carry a status of OK, ERROR, or UNSET. The mapping translates
graph-engine §4 error categories into status ERROR with a category-bearing description.
Trace. OTel's term for a complete tree of spans rooted at a single trace ID. By default, one
outermost graph invocation produces one trace; subgraphs (whether composed via
add_subgraph_node or instantiated by a fan-out per pipeline-utilities §9) participate in the
parent invocation's trace as nested spans. Implementations MUST also support an opt-in
detached mode for specific subgraphs or fan-outs (§4.4), where the subgraph or fan-out gets
its own trace and the parent's dispatch span carries an OTel Link to that new trace.
Correlation ID. A per-invocation identifier that flows across observability backends.
Distinct from invocation_id — the framework-generated invocation_id correlates spans within
a single backend, while correlation_id is application-supplied (or auto-generated when absent)
and is intended to be visible in every backend the implementation emits to. A user running an
LLM workflow with both an OTel backend (system traces, logs) and a Langfuse backend
(LLM-specific traces) uses the correlation_id as a join key between them: find a slow request
in Langfuse, search for its correlation_id in OTel logs, and see the surrounding
infrastructure activity. See §3 (architectural contract) and §5.6 (OTel attribute realization).
3. Cross-backend correlation ID¶
The correlation ID is a per-invocation identifier the framework propagates across every observability backend the implementation emits to. It is the join key for cross-backend pivots: when a user has both an OTel backend (system traces, logs) and an LLM-specific backend (e.g., Langfuse) configured, the correlation ID lets them follow a single request across both.
This section defines the architectural contract for the correlation ID. The OTel-specific realization — how it appears on spans and log records — is in §5.6 (cross-cutting attributes) and §7 (log correlation).
3.1 Lifecycle and propagation¶
The correlation ID is per-invocation and lives for the duration of one outermost invoke()
call. Implementations MUST:
- Accept a caller-supplied ID at invoke time (e.g., a keyword argument
correlation_idoninvoke(), an opt-in field on the invocation config record, or equivalent per-language convention). When the caller supplies an ID, the framework uses it verbatim. - Auto-generate an ID when absent. When the caller does not supply one, the framework MUST
generate a UUIDv4 (canonical 36-character form) at the start of the invocation. Caller-
supplied correlation IDs MAY be any non-empty URL-safe string (the caller might already use
request IDs from an upstream system, e.g., HTTP
X-Request-Idheaders); the format mandate applies only to the auto-generated case so that "you don't supply a correlation ID" produces consistent UUIDv4 output across implementations. - Propagate via the language's idiomatic context primitive — Python
ContextVar, TypeScriptAsyncLocalStorage, equivalents in other languages. The correlation ID MUST be readable from anywhere within the invocation's async call tree, including inside nodes, middleware, and observers, without explicit threading through function arguments. - Reset the context after the invocation completes so subsequent invocations get fresh correlation IDs.
The correlation ID is a string type. Format is implementation-defined beyond "non-empty string, URL-safe characters." Implementations SHOULD avoid characters that require escaping in OTel attribute serialization, JSON, or HTTP headers.
3.2 Distinction from invocation_id¶
correlation_id and invocation_id (defined in §5.1) serve different purposes:
| Concept | Generated by | Used for |
|---|---|---|
correlation_id |
Caller (or auto-generated when absent) | Cross-backend pivots; users follow a request across separate observability systems |
invocation_id |
Framework | Within-backend correlation; ties spans of one invocation together inside a single backend |
Both MAY be the same value if the user chooses (e.g., a caller-supplied UUID could be used as both), but the spec treats them as distinct fields. Backends MUST NOT conflate them.
3.3 Backend-mapping contract¶
Each backend mapping in this spec MUST define how the correlation ID surfaces in that backend. For the OTel mapping (this proposal):
- §5.6 specifies the
openarmature.correlation_idspan attribute that MUST appear on every span emitted during an invocation. - §7 specifies the log-record correlation rules —
openarmature.correlation_idon every log record emitted during an invocation, alongside OTel-nativetrace_id/span_id.
Future backend mappings (Langfuse, etc.) follow the same pattern: each spec section MUST include a "correlation ID realization" subsection naming the field/attribute/metadata key the backend uses.
Detached trace mode (§4.4) does not change correlation ID propagation — the correlation ID is invocation-scoped, not trace-scoped, so it flows through detached subgraphs and fan-outs unchanged. A detached subgraph's spans carry the same correlation ID as the parent trace's spans.
4. Span hierarchy¶
Each invocation of the outermost graph produces the following span tree:
- Invocation span. Root span for the whole call. Spans the time from
invoke()entering until the post-merge state is returned (or an error propagates). - Node spans. One per node execution. Children of the invocation span (for outermost-graph nodes) or of a subgraph span (for nodes inside a subgraph) or of a fan-out instance span (for nodes inside a fan-out instance — see §4.3).
- Subgraph spans. When a
SubgraphNoderuns, a span representing the entire subgraph execution wraps the inner-node spans. Child of the parent's invocation or subgraph span; sibling-equivalent to the surrounding parent's other node spans. - Fan-out spans. A fan-out node's overall execution is one span (per pipeline-utilities §9);
each fan-out instance produces its own subgraph span as a child. Per-instance attribution uses
the
openarmature.node.fan_out_indexattribute (§5.4). - Retry attempt spans. Each retry attempt of a node (per pipeline-utilities §6.1) produces its
own node span — the v0.6.0 §6 contract dispatches a started/completed pair per attempt, so each
attempt naturally maps to one span. Per-attempt attribution uses the
openarmature.node.attempt_indexattribute (§5.2).
The hierarchy is illustrated for a typical case:
invocation (root)
├── node: outer_in
├── subgraph: outer_sub
│ ├── node: inner_x
│ └── node: inner_y
└── node: outer_out
4.1 Span timing¶
A node span's start time is the moment the §6 started event fires for that attempt. Its end time
is the moment the §6 completed event fires for the same attempt. The pair model gives a clean
direct mapping — span open at started, span close at completed — with no middleware bracketing
required.
A subgraph span's start time is the moment the surrounding SubgraphNode's started event fires.
Its end time is the moment the same SubgraphNode's completed event fires.
The invocation span's start time is the entry of invoke(); its end time is the return. The
invocation span is the OTel parent for all top-level node spans within that invocation.
Implementations drive span lifecycle by registering an observer with the default phase
subscription (both started and completed); the OTel observer maintains a stack of open spans
keyed by (namespace, attempt_index, fan_out_index) and pairs each completed event with its
corresponding started. Because the §6 delivery queue is strictly serial across an invocation,
the start/close pairing is unambiguous.
Implementations MAY also use pipeline-utilities middleware as the lifecycle driver if they prefer — middleware can open the span in its pre-phase and close it in its post-phase. Both approaches produce identical span structure for conformance purposes; the contract is the emitted spans, not the driver mechanism. Most implementations will pick the observer-driven path for simplicity.
4.2 Status mapping¶
A span's OTel status is set as follows:
| Outcome | Status | Description |
|---|---|---|
| Node returns successfully and merge succeeds | OK |
(omit description) |
Node raises (graph-engine §4 node_exception) |
ERROR |
the §4 category identifier |
Edge function raises (edge_exception) |
ERROR |
the §4 category identifier; status applied to the preceding node span |
Reducer raises (reducer_error) |
ERROR |
the §4 category identifier |
Routing error (routing_error) |
ERROR |
the §4 category identifier; status applied to the preceding node span |
State validation error (state_validation_error) at entry |
ERROR |
the §4 category identifier; status applied to the invocation span (no node has run yet) |
State validation error (state_validation_error) at a node boundary |
ERROR |
the §4 category identifier; status applied to the failing node's span (per the SHOULD-validate-at-node-boundaries rule in graph-engine §2) |
State validation error (state_validation_error) at exit |
ERROR |
the §4 category identifier; status applied to the invocation span (failure is at the framework boundary, not tied to any node) |
When a span is set to ERROR, an OTel exception event MUST be recorded on the span carrying the
exception's class name and message; the exception's stack trace SHOULD be attached when the
language's OTel SDK supports it.
The three state_validation_error rows above attribute the failure to exactly one span — the
specific span where the validation occurred. The invocation span inherits ERROR via standard
OTel parent-status-from-failed-children propagation when any of these fail, but the spec does
NOT explicitly mark the invocation span ERROR for the node-boundary case (the inheritance is
sufficient — explicit duplicate attribution would create noise without adding diagnostic value).
4.3 Parent-child rules¶
Spans are parented as follows, using the §6 namespace and fan_out_index fields:
- A node event with
namespace = [name]andparent_states = []corresponds to an outermost-graph node. Its span's parent is the invocation span. - A node event with
namespace = [outer_sub, inner_name]corresponds to a node inside a subgraph. Its span's parent is the subgraph span forouter_sub. - A node event with
namespace = [outer_sub, even_inner_sub, inner_inner_name]corresponds to a node inside a doubly-nested subgraph. Its span's parent is the doubly-nested subgraph span. - A node event with
fan_out_indexpopulated corresponds to a node inside a fan-out instance. Its span's parent is the fan-out instance span (one perfan_out_indexvalue). - A node event with
attempt_index > 0corresponds to a retry attempt. Each attempt produces its own node span — the spans for attempts 0..N-1 are siblings sharing the same parent (typically the invocation span, subgraph span, or fan-out instance span depending on context).
The invariant len(parent_states) == len(namespace) - 1 from §6 is preserved by this mapping: each
parent-state entry corresponds to exactly one ancestor span. The attempt_index and
fan_out_index fields disambiguate sibling spans at the same hierarchy level.
4.4 Detached trace mode (opt-in)¶
The default behavior described in §4.1–§4.3 puts every span produced during a single invoke()
call into one trace. This is the right default for typical LLM workloads but breaks down in two
cases: very large fan-outs (thousands of items produce thousands of sibling spans, slowing backend
UIs and complicating filtering) and long-running subgraphs (sampling decisions at the trace root
can drop everything; real-time visibility into intermediate progress is hard while the parent
trace is still open).
For these cases, implementations MUST support a detached trace mode, opt-in per subgraph or per fan-out node. The configuration mechanism is implementation-defined (e.g., a parameter on the OTel observer's constructor naming detached subgraph and fan-out node names; per-language ergonomic API). The behavioral contract is what follows, regardless of how the user expresses the opt-in.
When a subgraph or fan-out is configured as detached:
- The implementation creates a new OTel
SpanContext(newtrace_id) at the subgraph's or fan-out's entry — distinct from the parent's invocationtrace_id. - The parent's subgraph-dispatch span (or fan-out node span) is opened in the parent's
invocation trace as usual, BUT carries an OTel
Linkwhose target is the new detachedtrace_id. The link associates the parent's record of "this subgraph dispatched" with the detached trace's full record of "this is what happened inside" without parent-child semantics. - All spans inside the detached subgraph or fan-out — node spans, nested subgraph spans, retry
attempt spans, LLM provider spans — use the new
trace_idas their root. They are NOT children of the parent's invocation span. - The parent's subgraph-dispatch span ends when the subgraph completes (per §4.1 timing rules) and reflects the subgraph's outcome via §4.2 status mapping. Status propagation across the trace boundary uses OTel's standard link semantics — the parent's status reflects the parent's view of the dispatch outcome.
- For detached fan-out: each instance gets its OWN trace (one trace per instance). The fan-out node's span carries one Link per instance trace. Detaching at the fan-out level effectively turns N concurrent instances into N concurrent traces with N links from the fan-out node.
When a subgraph or fan-out is NOT configured as detached (the default), §4.1–§4.3 nested behavior applies — everything in one trace.
Composition with attempt_index. Retry attempt spans live in the same trace as their parent
node — trace_isolation does NOT apply per-attempt; it applies per-subgraph or per-fan-out. A
retried node inside a detached subgraph produces sibling attempt spans inside the detached trace.
Composition with nested subgraphs. Detached mode applies at the subgraph or fan-out where it is configured. A detached subgraph that itself contains a non-detached inner subgraph keeps the inner subgraph nested within the (now-detached) outer subgraph's trace. A detached subgraph that contains a detached inner subgraph produces three separate traces (parent, outer detached, inner detached) with two Links.
Configuration example (informative; per-language API):
# Python — opt-in via OTel observer constructor
otel_observer = OTelObserver(
detached_subgraphs={"long_running_workflow"},
detached_fan_outs={"per_document_scoring"},
)
graph.add_observer(otel_observer)
The implementation looks up the relevant set when entering a subgraph or fan-out by name and creates the detached trace if matched. Other detachment-configuration shapes (decorator, graph-builder argument, etc.) are equivalently valid as long as the behavioral contract above holds.
4.5 Span names¶
Span names are how OTel trace UIs identify each span in lists, search results, and aggregations. Implementations MUST use these names for spans they emit:
| Span type | Span name |
|---|---|
| Invocation span | "openarmature.invocation" (constant) |
| Node span | The node's registered name in its containing graph (e.g., "summarize_doc", "score_relevance") |
Subgraph span (regular add_subgraph_node) |
The SubgraphNode's name in the parent graph |
| Fan-out node span (the parent dispatch span) | The fan-out node's name in the parent graph |
| Fan-out instance span (each instance's subgraph dispatch) | The fan-out node's name in the parent graph; disambiguated from the fan-out node span and from siblings by the openarmature.node.fan_out_index attribute and parent-child hierarchy |
| LLM provider span | "openarmature.llm.complete" (constant) |
| Retry attempt spans | Same name as the wrapped node; disambiguated from sibling attempt spans by the openarmature.node.attempt_index attribute |
Rationale: trace UIs display span names prominently. User-named spans (node, subgraph, fan-out)
let users find their familiar labels in the UI without indirection — "I see a span called
summarize_doc, that's the one I wrote." Framework-emitted spans that are not user-named
(invocation, LLM provider) use a constant openarmature.* prefix so they're identifiable as
framework emissions without colliding with user-chosen names. Cardinality concerns are
typically not a problem for LLM workflows (10–50 nodes per pipeline, not thousands); backends
needing low-cardinality aggregations build them from the openarmature.node.name attribute
(per §5.2) instead.
5. Attribute namespace¶
All openarmature-emitted attributes use the prefix openarmature.. The mapping defines the
following normative attribute keys; implementations MUST emit each on the spans listed.
5.1 Invocation span attributes¶
openarmature.invocation_id— string. A unique identifier for this invocation. MUST be a UUIDv4 (canonical 36-character form:xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx). Mandating the format gives users a consistent experience across implementations: dashboard queries, log searches, and cross-tool correlation all assume the same shape regardless of which language emitted the trace.openarmature.graph.entry_node— string. The entry node name of the outermost graph.openarmature.graph.spec_version— string. The version of the openarmature-spec the implementation targets (e.g.,"0.7.0"). Sourced from the implementation's package metadata.
5.2 Node span attributes¶
Required on every node span:
openarmature.node.name— string. The node's name in its immediate containing graph.openarmature.node.namespace— string array. The §6namespacefield, as an OTel string array. Implementations MUST NOT join the namespace into a single string at the OTel boundary.-
openarmature.node.step— int. The §6stepfield. -
openarmature.node.attempt_index— int. The §6attempt_indexfield.0for nodes not wrapped by retry middleware;0..N-1across the N spans produced by an N-attempt retry.
When the node fails:
openarmature.error.category— string. The §4 category identifier (e.g.,node_exception,reducer_error). Set on thecompletedspan only;startedspans never carry an error attribute.
5.3 Subgraph span attributes¶
Required on every subgraph span:
openarmature.node.name— string. The name of theSubgraphNodein the parent graph.openarmature.subgraph.name— string. The compiled subgraph's name (if the implementation tracks one) or the empty string. Optional in practice; populated when available.
5.4 Fan-out span attributes¶
The following attributes MUST appear on fan-out instance spans (per pipeline-utilities §9):
openarmature.node.fan_out_index— int. The §6fan_out_indexfor this instance.openarmature.fan_out.parent_node_name— string. The fan-out node's name in the parent graph.
Fan-out node spans (the parent of the per-instance subgraph spans) carry:
openarmature.fan_out.item_count— int. The resolved instance count (matches thecount_fieldvalue when configured; matcheslen(items_field)in items_field mode).openarmature.fan_out.concurrency— int. The resolved concurrency bound (or a sentinel int for unbounded;0is RECOMMENDED).openarmature.fan_out.error_policy— string. One of"fail_fast"or"collect". Useful for filtering traces by policy.
Implementations source these attributes from the corresponding graph-engine §6 NodeEvent
fields, preserving the two-span-category distinction above:
- Fan-out node span attributes.
openarmature.fan_out.item_count,openarmature.fan_out.concurrency, andopenarmature.fan_out.error_policygo on the fan-out node span. Sourced fromevent.fan_out_configon the fan-out node's ownstarted/completedevents. - Fan-out instance span attributes.
openarmature.fan_out.parent_node_namegoes on the per-instance fan-out instance spans (not on the fan-out node span). It is also surfaced viaevent.fan_out_configon the fan-out node'sstartedevent, but per-instance events don't themselves carryfan_out_config— the observer caches the value from the fan-out node's started event and applies it when synthesizing each per-instance instance span.openarmature.node.fan_out_indexalso goes on per-instance instance spans (and on inner-node spans nested below); it is sourced directly fromevent.fan_out_indexon those inner-node events.
The per-instance span layout (one per-instance subgraph span as a child of the fan-out node span, with inner-node spans nested below) is required by §4 for both detached and non-detached fan-out modes — the only behavioral difference between detached and non-detached is the trace-id treatment per §4.4, not the per-instance layout.
5.5 LLM provider attributes¶
Implementations of the llm-provider capability (per llm-provider §5 / proposal 0006), when paired
with an OTel observer per this mapping, MUST emit a span around each complete() call. This is a
cross-capability coupling: any implementation that ships both llm-provider and the OTel mapping
MUST wire them together so that LLM calls are not invisible in the OTel trace. Production
observability has no gaps by default rather than hoping the user remembered to instrument
LLM calls. The §6 TracerProvider-isolation requirement prevents this from duplicating spans with
external auto-instrumentation libraries (OpenInference, opentelemetry-instrumentation-openai,
etc.), which write to the OTel global provider while openarmature writes to its private one.
Opt-out for external-instrumentation-only setups. Implementations MUST support disabling
the openarmature-emitted LLM provider span — a configuration parameter on the OTel observer
(implementation-defined ergonomics; e.g., disable_llm_spans=True). This serves the explicit
case where the user prefers their external auto-instrumentation library as the canonical source
of LLM spans and wants openarmature to stay out of that lane. With the flag enabled, the OTel
observer skips the §5.5 span entirely; all other spans (node, subgraph, fan-out, etc.) continue
to emit normally per their respective rules. See §5.5.4 for the additional payload and GenAI
semconv opt-out flags introduced by proposal 0024.
The LLM provider span's parent is the node span of the node that invoked the provider. This provides direct attribution of LLM calls to the graph nodes they originate from.
Baseline attributes (v0.7.0). The following attributes are emitted on every LLM provider
span unless the span itself is suppressed via disable_llm_spans:
openarmature.llm.model— string. The model identifier the provider is bound to.openarmature.llm.finish_reason— string. The llm-provider §6finish_reasonfrom the response.openarmature.llm.usage.prompt_tokens,openarmature.llm.usage.completion_tokens,openarmature.llm.usage.total_tokens— int. From the response's usage record. Omit when null.
The remainder of §5.5 (subsections §5.5.1 through §5.5.6, introduced by proposal 0024) extends the
attribute set with input/output payload (§5.5.1, default-off), RuntimeConfig request parameters
under the OpenTelemetry GenAI semantic conventions (§5.5.2), a minimum set of GenAI semconv
response attributes (§5.5.3), the two new opt-out flags governing payload and GenAI semconv
emission (§5.5.4), the truncation contract governing payload byte length (§5.5.5), and
cross-implementation consistency rules (§5.5.6). No existing attribute is renamed; all additions
sit alongside the baseline list.
5.5.1 Input/output payload attributes (default-off)¶
When the LLM payload-emission flag is enabled (per §5.5.4), implementations MUST emit the following attributes on the LLM provider span:
-
openarmature.llm.input.messages— string. The messages list sent to the provider, JSON-encoded per the llm-provider §3 message shape. Each message is serialized as{role, content, tool_calls?, tool_call_id?}. Content blocks (per llm-provider §3.1) are serialized with the discriminator ({type, text}for text blocks,{type, source, media_type?, detail?}for image blocks) — but inline image bytes are replaced with a placeholder per §5.5.5. The serialization MUST be deterministic for identical inputs within an implementation — i.e., the same implementation with the same input MUST produce identical bytes. Cross-implementation bytewise stability (Python and TypeScript producing identical bytes for the same input) is NOT required by this specification — JSON encoding rules vary across language standard libraries (number formatting, string escaping, key-ordering details), and mandating bytewise equality across implementations would require a canonical JSON scheme like RFC 8785 JCS, which is out of scope here. Implementations MUST sort object keys lexicographically and MUST emit UTF-8-encoded output without insignificant whitespace; the conformance fixtures assert that the attribute parses to an equivalent §3 message structure rather than bytewise equality. -
openarmature.llm.output.content— string. The assistant's response content verbatim, as returned by the provider in the §6message.contentfield. Emitted only whenmessage.contentis non-empty (assistant messages with onlytool_callsand empty content MUST NOT emit this attribute). WhenResponse.parsedis populated (per llm-provider §6, structured output), this attribute carries the unparsedmessage.contentstring, NOT a re-serialization ofparsed— matching the llm-provider §6 rule thatmessage.contentis verbatim. -
openarmature.llm.request.extras— string. TheRuntimeConfigextras mapping (theextra="allow"pass-through fields permitted by llm-provider §6), JSON-encoded as an object. Emitted only when the mapping is non-empty. This attribute is OA-shape (the extras bag is the spec's structure, not the GenAI semconv's); it is grouped with payload because it MAY contain provider-specific parameters that warrant the same default-off treatment as messages. Implementations MAY choose to gaterequest.extrasseparately frominput.messages/output.content; the default is to gate all three under the same flag.
All three attributes are subject to the §5.5.5 truncation contract.
5.5.2 Request parameters¶
Implementations MUST emit the following attributes on the LLM provider span when the
corresponding RuntimeConfig (§6 of llm-provider) field is set on the request, unless the GenAI
semconv opt-out is enabled (per §5.5.4):
gen_ai.request.temperature— double. Mapped fromRuntimeConfig.temperature.gen_ai.request.max_tokens— int. Mapped fromRuntimeConfig.max_tokens.gen_ai.request.top_p— double. Mapped fromRuntimeConfig.top_p.gen_ai.request.seed— int. Mapped fromRuntimeConfig.seed.
When the corresponding RuntimeConfig field is not set (or RuntimeConfig is absent on the
call), the implementation MUST NOT emit the attribute. The absence of an attribute means "the
field was not supplied for this call," distinct from "the field was supplied with a zero value."
These attributes use the GenAI semconv namespace directly (no openarmature.llm.request.*
parallel). Rationale: temperature, max_tokens, top_p, and seed are cross-vendor LLM
parameters with no OpenArmature-specific semantics. The GenAI semconv names for these are settled
in the upstream specification and are the names every LLM-aware OTel backend reads. Adding
OA-prefixed parallels would be pure duplication.
This establishes a precedent that future cross-spec touchpoints follow: the OpenArmature attribute namespace is normative for attributes encoding OA-specific state (correlation_id, prompt identity, error category, fan-out index, etc.); the GenAI semconv namespace is used directly for cross-vendor LLM parameters and response metadata when the semconv name is stable.
5.5.3 GenAI semconv response attributes¶
Implementations MUST emit the following attributes on the LLM provider span unless the GenAI semconv opt-out is enabled (per §5.5.4):
-
gen_ai.system— string. The LLM system identifier, per the GenAI semconv enum ("openai","anthropic","vllm","lm_studio", etc.). Implementations MUST allow this value to be configurable per provider instance. The OpenAI-compatible provider (§8.1 of llm-provider) MUST default this value to"openai"; callers using the OpenAI-compatible provider with a non-OpenAI endpoint (vLLM, LM Studio, llama.cpp server, etc.) MUST be able to override this default to the appropriate system identifier. Specific override mechanism (constructor argument, factory method, environment variable) is implementation-defined; the behavioral contract is that an override is available and effective. -
gen_ai.request.model— string. The model the request was made against — the model identifier bound to the provider. Mirrorsopenarmature.llm.model; both emit. Rationale: the GenAI semconv requires this name for backend recognition; the OA-namespaced version is preserved for backwards compatibility with v0.7.0 fixtures. -
gen_ai.response.model— string. The model identifier the provider returned in the response (themodelfield on the response body, when the provider populates it). Distinct fromgen_ai.request.modelbecause providers MAY return a more specific model identifier than the one requested (e.g., requestedgpt-4o, response carriesgpt-4o-2024-08-06). Emitted only when the provider returns a non-null response model. -
gen_ai.usage.input_tokens— int. The prompt token count from the response's usage record. Mirrorsopenarmature.llm.usage.prompt_tokens; both emit. Omit when the response's usage record is null. -
gen_ai.usage.output_tokens— int. The completion token count from the response's usage record. Mirrorsopenarmature.llm.usage.completion_tokens; both emit. Omit when null. -
gen_ai.response.finish_reasons— string array. Thefinish_reasonvalues from the response, as a single-element array (the llm-provider §6Response.finish_reasonis a single string; the GenAI semconv defines this as an array to accommodate providers returning multiple choices, which OA's §6 shape collapses to one). Mirrorsopenarmature.llm.finish_reasonas string-scalar; both emit, with the GenAI version always wrapped in a one-element array. -
gen_ai.response.id— string. The response identifier the provider returned (theidfield on the response body), when present. Useful for cross-referencing OA spans with provider-side billing or audit logs. Emitted only when the provider returns a non-null id.
5.5.4 Opt-out flags¶
Implementations MUST support the following observer-level configuration flags (specific ergonomics — constructor argument, builder method, etc. — are implementation-defined; flag names below are normative for cross-implementation consistency):
-
disable_llm_payload: bool— defaultTrue. WhenTrue, the §5.5.1 payload attributes (input.messages,output.content,request.extras) are NOT emitted. WhenFalse, payload attributes emit per §5.5.1, subject to the §5.5.5 truncation contract. -
disable_genai_semconv: bool— defaultFalse. WhenTrue, the §5.5.2 request-parameter attributes and the §5.5.3 response-attribute set are NOT emitted. WhenFalse(the default), GenAI semconv attributes emit per §5.5.2 and §5.5.3.
The existing disable_llm_spans flag (above) MUST continue to behave as specified: when True,
the LLM provider span is not emitted at all, and none of the attributes specified in §5.5.1
through §5.5.3 are emitted (they have no span to attach to).
The three flags are independent. Typical configurations:
| Configuration | disable_llm_spans |
disable_llm_payload |
disable_genai_semconv |
Outcome |
|---|---|---|---|---|
| Default (out of the box) | False |
True |
False |
LLM span emits with OA + GenAI semconv attributes; no payload. |
| Maximum visibility | False |
False |
False |
LLM span emits with full payload and all attributes. |
| External auto-instrumentation is canonical | True |
(irrelevant) | (irrelevant) | OA emits no LLM span; external library handles it. |
| OA span without GenAI semconv | False |
True |
True |
OA-namespaced attributes only; useful when an external library is the canonical GenAI emitter and OA's role is internal-only attribution. |
5.5.5 Truncation contract¶
The §5.5.1 payload attributes (openarmature.llm.input.messages,
openarmature.llm.output.content, openarmature.llm.request.extras) MAY be arbitrarily large in
principle (a long conversation, a verbose model response, a multi-image user message). Emission
without bounds would produce spans larger than typical OTLP exporters accept and inflate
observability storage unbounded. The following contract applies:
Per-attribute byte cap. Implementations MUST enforce a maximum byte length on each of the three payload attributes individually. The default cap is 65,536 bytes (64 KiB) per attribute. Implementations MUST allow the cap to be configured per observer (specific mechanism — constructor argument, environment variable, etc. — is implementation-defined). The byte length is measured on the UTF-8 encoding of the final attribute string, after JSON serialization and after inline-image redaction (below).
Truncation algorithm. When an attribute's serialized value exceeds the configured cap, the implementation:
- Computes M, the pre-truncation byte length of the serialized value.
- Formats the truncation marker with M substituted:
and computes L_marker, the UTF-8 byte length of the marker string.
3. Computes the target prefix size N = configured_cap - L_marker.
4. Finds N' = the largest UTF-8 code-point boundary ≤ N in the serialized value. If N
falls inside a multi-byte sequence, the implementation MUST backtrack to the previous
code-point boundary; this prevents splitting multi-byte sequences (CJK, emoji, combining
marks) and emitting invalid UTF-8 that OTLP exporters may reject.
5. Emits the first N' bytes of the serialized value followed by the marker.
The resulting attribute is at most configured_cap bytes (may be strictly less if N' < N due
to boundary backtracking). The marker is a fixed UTF-8 string (its leading character is U+2026
HORIZONTAL ELLIPSIS, encoded as the 3-byte sequence 0xE2 0x80 0xA6). It introduces no further
UTF-8 boundary concerns beyond those step 4 already handled, because the implementation appends
the marker as a whole unit — never partially. The marker is appended outside any JSON
encoding — the result of truncating a JSON-encoded attribute is not itself parseable JSON, which
is the signal to backend code that the value was truncated. Backends performing custom parsing
get a clean affordance to detect truncation without needing a separate flag attribute.
Minimum cap. Implementations MUST reject cap configurations smaller than 256 bytes at observer construction time. Rationale: 256 bytes leaves room for the worst-case marker (~36 bytes) plus a diagnostically useful payload preview; caps below this would produce attributes that are almost entirely marker with little or no preview value. The 256-byte minimum is normative for cross-implementation consistency.
Inline-image redaction. Image content blocks (per llm-provider §3.1.2) carry either a URL
source or inline base64 bytes (per §3.1.3). The URL form is a short string and passes through
unchanged. The inline form is potentially very large (base64-encoded image bytes). When
serializing messages for openarmature.llm.input.messages, implementations MUST replace
inline-image source records with a redacted placeholder before JSON encoding:
where <mt> is the original media_type (preserved at the image-block level per llm-provider
§3.1.2) and <N> is the byte length of the original base64-encoded data. The image block's
detail field (if present per §3.1.2) is preserved unchanged; only the source is replaced
with the redacted variant. The placeholder preserves enough metadata for a reader to understand
"an inline image of this type and approximate size was present" without inlining the bytes
themselves. Implementations MUST NOT emit inline image bytes on the span under any
configuration; this is a hard rule, not gated by disable_llm_payload or by the per-attribute
cap.
URL-form images are NOT redacted — the URL is a short string and is informative for trace
readers (it points to the actual image asset). The redaction rule applies only to
source.type == "inline".
Tool-call serialization. Assistant tool_calls (per llm-provider §3) in
openarmature.llm.input.messages are JSON-encoded as [{"id", "name", "arguments"}, ...] with
arguments serialized verbatim from the parsed mapping. Tool-call argument content is subject
only to the overall per-attribute byte cap; this specification does not specify a separate
per-tool-call cap. (First-class tool-call observability is a separate forthcoming proposal.)
5.5.6 Cross-implementation consistency¶
Implementations of §5.5.1 through §5.5.5 across languages (Python, TypeScript) MUST agree on:
- Attribute names (exactly as specified above; case- and prefix-sensitive).
- Attribute value types (string, int, double, string-array as specified).
- JSON serialization shape for
input.messagesandrequest.extras— sorted object keys lexicographically, UTF-8 encoding, no insignificant whitespace, within-implementation determinism per §5.5.1. Cross-implementation bytewise stability is NOT required by this specification; a follow-on MAY adopt a canonical JSON scheme (e.g., RFC 8785 JCS) to tighten this if cross-impl bytewise equality becomes load-bearing. - The truncation marker string (
…[truncated, M bytes total], including the Unicode ellipsis character…U+2026, the brackets, the comma, the literal word "truncated", and the integer M). - The inline-image placeholder shape (the
{type: "image", source: {type: "inline_redacted", byte_count}, media_type, detail?}record —media_typeat the image-block level per llm-provider §3.1.2, withdetailpreserved verbatim when present). - The default values:
disable_llm_payload = True,disable_genai_semconv = False,disable_llm_spans = False.
Per-language ergonomics (constructor argument naming, builder patterns, environment-variable lookup) MAY differ. The above are the cross-impl behavioral surface.
5.6 Cross-cutting attributes¶
These attributes appear on EVERY span emitted during an invocation, regardless of span type (invocation, node, subgraph, fan-out instance, LLM provider call, retry attempt):
openarmature.correlation_id— string. The correlation ID for this invocation, per §3. Set on every span when a correlation ID is in scope (which, per §3.1, is the entire duration of an invocation — so every span emitted during the invocation MUST carry it). The same correlation ID appears on spans within detached subgraphs and detached fan-out instances (per §4.4 detached mode).
The cross-cutting nature of openarmature.correlation_id means observability backends can
filter for "all spans related to request X" with a single attribute query, regardless of which
node, subgraph, or fan-out instance emitted the span.
6. Driving span lifecycle¶
The v0.6.0 §6 pair model gives OTel a natural lifecycle driver: register an observer with the
default phase subscription (both started and completed), and let the started event open the
span and the completed event close it.
Observer-driven (RECOMMENDED). An OTel observer maintains a stack of in-flight spans keyed by
(namespace, attempt_index, fan_out_index). On a started event, it opens a new span with the
attributes from §4 and pushes it onto the stack. On the completed event with the matching key,
it pops the span, sets the status (per §4.2) and any error attributes, then closes the span.
async def otel_observer(event):
key = (tuple(event.namespace), event.attempt_index, event.fan_out_index)
if event.phase == "started":
span = tracer.start_span(span_name(event), attributes=base_attrs(event))
spans[key] = span
else: # completed
span = spans.pop(key)
if event.error is not None:
span.set_status(ERROR, description=event.error.category)
span.record_exception(event.error.exception)
else:
span.set_status(OK)
span.end()
Because the §6 delivery queue is strictly serial across an invocation, the start/close pairing is
unambiguous — started and completed events for the same attempt are delivered in order, with
no interleaving. The observer's spans dictionary never has a key collision during normal
execution.
Middleware-driven (alternative). Implementations MAY use a pipeline-utilities middleware as the lifecycle driver instead:
async def otel_middleware(state, next):
with tracer.start_as_current_span(span_name_for_node()) as span:
try:
partial_update = await next(state)
span.set_status(OK)
return partial_update
except Exception as exc:
span.set_status(ERROR, description=getattr(exc, "category", "unknown"))
span.record_exception(exc)
raise
Both approaches produce identical span structure for conformance purposes; the contract is the emitted spans, not the driver mechanism. Most implementations should pick observer-driven for simplicity (one registration, no per-node opt-in required).
OpenTelemetry context propagation. Implementations using the observer-driven path MUST manually maintain the OTel current-span context — observers run on the §6 delivery queue, not in the node's call stack, so the OTel SDK's automatic context propagation may not see the right parent. Implementations using the middleware-driven path get OTel context propagation for free (the middleware runs in the node's call stack).
TracerProvider isolation (MUST). Implementations MUST use a private TracerProvider for
openarmature-emitted spans. They MUST NOT register this provider as the OTel global
(trace.set_tracer_provider() in Python; equivalent global-registration calls in other
languages). Rationale: many other libraries (vendor-neutral OTel auto-instrumentation packages
such as opentelemetry-instrumentation-openai, OpenInference, LiteLLM-with-OTel, Langfuse v3, etc.)
emit OTel spans through the global provider when one is set. If openarmature also registered
itself globally, those libraries would emit duplicate spans alongside openarmature's, producing
two spans per LLM call (or per HTTP call, etc.) with different attribute namespaces. The user
sees inflated traces and gets billed/charged for the duplication.
Private-provider isolation lets openarmature emit its spans cleanly without interfering with whatever other instrumentation the user has configured. The user's separate auto-instrumentation continues to write to the global provider; openarmature writes to its private provider; both sets of spans flow to the configured exporter (typically the same OTLP endpoint), and the user filters or correlates them by attribute namespace.
This pattern is non-obvious but production-validated — naive implementations register globally and discover the duplication only after deploying. Mandating it in the spec saves every implementation from rediscovering the issue.
7. Log correlation¶
OpenTelemetry has a first-class Logs signal alongside Traces and Metrics. Log records carry
their own attributes plus the active OTel TraceContext (trace_id, span_id, trace_flags).
Implementations of this OTel mapping MUST integrate the framework's logging output into the
OTel Logs SDK so that:
- Log records emitted from anywhere within an invocation (framework code, node functions,
middleware, observers) carry the active span's
trace_idandspan_id. This is OTel's native trace-log correlation; it falls out of using the OTel Logs SDK when the active span context is propagated correctly. - Log records carry
openarmature.correlation_idmatching the invocation's correlation ID (per §3). This enables cross-backend correlation: a user reading OTel logs in HyperDX, Datadog, or another OTel-aware backend can find logs matching acorrelation_idreturned from a Langfuse trace or any other backend.
Required log-record fields:
openarmature.correlation_id— string. The invocation's correlation ID. Set on every log record emitted during the invocation.trace_id,span_id— string. The active span's identifiers, populated automatically by the OTel Logs SDK when the user's logger is bridged to OTel Logs (see implementation guidance below).
Implementation guidance (informative; per-language ergonomics):
- Python. Use
opentelemetry-sdk._logs.LoggingHandlerattached to the stdlibloggingroot logger. The handler reads the active span context and attachestrace_id/span_idautomatically. Injectcorrelation_idvia a logging filter that reads theContextVarcarrying the correlation ID, or viastructlog.contextvars.bind_contextvarsif the user is using structlog. - TypeScript. Use the equivalent OTel Logs Bridge for the user's logger (winston, pino,
bunyan all have OTel bridges). Inject
correlation_idvia the bridge's context-attribute mechanism reading fromAsyncLocalStorage.
Detached trace mode (§4.4) and log correlation. Log records emitted inside a detached
subgraph or fan-out instance carry the detached trace's trace_id and span_id, NOT the
parent invocation's. The correlation_id is unchanged (invocation-scoped, not trace-scoped).
This means filtering logs by correlation_id finds all logs across all detached and parent
traces; filtering by trace_id finds only the logs from one specific trace.
User-emitted logs from within nodes. Logs emitted by user code inside a node function
participate in the same correlation rules: if the user uses the language's stdlib logger
(Python logging, TypeScript console/winston/pino), the OTel Logs Bridge handles attribution
transparently. If the user uses a custom logger that isn't bridged to OTel, framework
correlation is best-effort — the spec contract applies to logs that flow through the OTel
Logs SDK.
8. Determinism¶
OTel span content is a function of (a) the §6 observer event stream and (b) implementation-specific data (timestamps, span IDs, trace IDs). The graph-engine §5 determinism guarantee covers the §6 event stream — for the same input, the same events fire in the same order with the same payloads. The implementation-specific data (IDs, timestamps) is inherently nondeterministic and is therefore NOT covered by determinism guarantees.
The conformance suite asserts determinism over the deterministic portion of span content: span hierarchy, span names, span attributes (excluding timing-derived attributes), and span status. It does NOT assert exact timestamps or IDs.
9. Out of scope¶
Not covered by this specification; deferred to follow-on proposals or sibling sections of this spec:
- Langfuse mapping — separate proposal; will live as §X of this same spec.
- Custom backends — users may emit any custom backend by implementing observers and middleware that consume the §6 stream and the spec doesn't constrain those.
- Sampling — OTel sampling is configured at the SDK level, outside the framework's contract.
Implementations MAY hint via
record_exceptionand span priority but the contract here is on the structure of emitted spans, not on whether to emit them. - Metrics — OTel metrics (counters, histograms) for graph-level operations. The current spec is trace-only.
- Baggage and context propagation — OTel baggage for request-ID-style propagation across service boundaries. Defer until a concrete cross-service use case surfaces.
- Span links — OTel span links between traces (e.g., for batch operations that accumulate inputs from many traces). Defer until needed.