0059: Retrieval-Provider Capability (Embedding Protocol)¶
- Status: Accepted
- Author: Chris Colinsky
- Created: 2026-06-08
- Accepted: 2026-06-09
- Targets: spec/retrieval-provider/spec.md (creates — new capability defining the protocol shape for non-LLM-completion provider operations; this proposal lands the capability scaffold + the
EmbeddingProviderprotocol as the first protocol surface; sibling rerank protocol scoped to a forthcoming proposal extending the same capability); spec/graph-engine/spec.md (§6 — add two new typed event variants on the observer event union:EmbeddingEventandEmbeddingFailedEvent, parallelingLlmCompletionEvent+LlmFailedEvent); spec/observability/spec.md (§5.5 — OTel mapping for embedding spans using the Stable subset of GenAI semconv attributes plus OA-namespace operation discrimination via span name; §8 — Langfuse mapping using Langfuse's dedicatedEmbeddingobservation type; §5.5.4 — rename the existingdisable_llm_payloadobserver-level privacy flag todisable_provider_payloadso the flag's scope cleanly covers payload from any provider operation, with cross-references in §8 + graph-engine §6 updated accordingly); plus new conformance fixtures underspec/retrieval-provider/conformance/andspec/observability/conformance/. - Related: 0006 (llm-provider core — established the per-model-binding + typed-response pattern this proposal mirrors for embedding), 0049 (typed
LlmCompletionEvent— typed-event pattern on the observer union this extends), 0057 (LlmCompletionEvent field-set extension — request-side / prompt-identity / per-call disambiguator fields this proposal mirrors ontoEmbeddingEventfrom launch rather than via a follow-on cycle), 0058 (LlmFailedEvent typed variant — failure-side typed event paired with the success-side per the success+failure pairing precedent this proposal applies) - Supersedes:
Summary¶
Creates a new retrieval-provider capability covering non-LLM-completion provider
operations. The capability sits alongside llm-provider (proposal 0006) rather than
extending it — embedding (and rerank, in a forthcoming follow-on) have disjoint
per-model-binding semantics from LLM completions: one provider instance binds to one
model identifier per the existing llm-provider §5 contract, and embedding model
identifiers (text-embedding-3-small, voyage-3, embed-multilingual-v3.0, etc.) live
in disjoint namespaces from completion model identifiers (gpt-4o-mini,
claude-3-5-sonnet, etc.). A single Provider abstraction bundling both surfaces would
contradict the per-model-binding contract or carve a different shape for the same
protocol — both bad.
This proposal lands:
- The
retrieval-providercapability scaffold —spec/retrieval-provider/spec.mdwith the protocol-naming conventions, error-semantics inheritance from llm-provider §7, and the typed-event pairing convention (success + failure variant per protocol from day one, per the 0049 → 0058 pairing precedent). - The
EmbeddingProviderprotocol —ready()+embed(input: list[str], *, config?) -> EmbeddingResponse, with typed response + usage records mirroring llm-provider §6'sResponseshape. - Two new typed observer events on the graph-engine §6 observer event union:
EmbeddingEvent(success) andEmbeddingFailedEvent(failure). Each carries the identity / scoping / request-side field set proposal 0057 established forLlmCompletionEvent, plus embedding-specific success-side or failure-specific fields. - OTel + Langfuse backend mappings for embedding observability — Stable GenAI semconv
attributes (
gen_ai.system,gen_ai.request.model,gen_ai.usage.input_tokens) plus OA-namespaceopenarmature.embedding.*attributes; span nameopenarmature.embedding.completediscriminates the operation; Langfuse mapping uses the dedicatedEmbeddingobservation type (NOTGenerationwith operation metadata — verified against current Langfuse docs at draft time). The upstreamgen_ai.operation.name = "embeddings"attribute is at Development status as of draft time and deferred to a follow-on per the stable-only adoption policy. - A cross-spec rename of the observer-level privacy flag from
disable_llm_payloadtodisable_provider_payload, folded into this proposal so the embedding-payload privacy posture lands on the renamed flag from launch (avoiding a naming-mismatch transition window). The semantics broaden to cover payload from any provider call (LLM + embedding + rerank when it lands); default-True behavior preserved.
The sibling RerankProvider protocol lands in a forthcoming proposal extending the
same capability.
Motivation¶
Three forces converge.
Observability-blind-spot for embedding calls today. Downstream RAG pipelines hitting
OpenAI-wire-compatible embedding endpoints (Bifrost, Cohere, Voyage AI, Jina AI, etc.)
currently bypass OpenArmature entirely via direct HTTP calls — embedding falls outside
the observability stream (no OTel span, no observer event, no Langfuse Generation
observation, no per-invocation cost rollup via the proposal 0048 queryable observer
pattern). The framework's "observable by default" promise has a real hole.
Per-model-binding is load-bearing. The llm-provider §5 contract is "one provider
instance, one model identifier." Embedding model identifiers and completion model
identifiers live in disjoint namespaces; a single provider instance bundling both either
contradicts the contract OR carves a different contract shape for the same protocol.
Both options break existing invariants. A separate EmbeddingProvider protocol — bound
to its own model identifier per instance — preserves the per-model-binding contract
while opening a path to observable embedding calls.
Real provider-landscape variance. OpenAI does completion + embedding (no rerank). Voyage AI does embedding + rerank (no completion). Anthropic only does completion. Jina AI mostly does rerank. Cohere does all three. A unified Provider abstraction bundling all three operations would force every implementation to stub out methods the backend doesn't serve. Separate protocols let each implementation declare exactly what its backend supports; downstream pipelines configure one instance per capability they need.
Type signatures stay precise as a fourth-order benefit — embed() returns vectors keyed
by input order; complete() returns a structured LLM response. A union return type or
generic query() method would erase per-capability semantics the type system protects
today.
Proposed change¶
Create spec/retrieval-provider/spec.md¶
A new capability sitting alongside llm-provider. The §-structure:
§1 Purpose¶
Frames the retrieval-provider capability as the home for retrieval-primitive provider
operations — embedding (this proposal) and rerank (forthcoming). The capability inherits
llm-provider's per-model-binding contract, error-category enumeration (§7), and typed-
response shape conventions. It does NOT extend llm-provider's Provider protocol; the
protocols defined here are siblings, not subtypes.
Retrieval-provider is one of a planned family of <domain>-provider capabilities
(llm-provider, retrieval-provider, plus future siblings as downstream demand
surfaces — e.g., voice-provider for ASR + TTS, multimodal-provider for image
generation + image edit). Each domain capability covers related-shape provider
operations; new domains land as new capabilities rather than as extensions to existing
ones. This keeps per-capability protocol surface narrow and per-domain evolution
independent.
§2 Concepts¶
Defines: RetrievalProvider (umbrella term covering both EmbeddingProvider and
RerankProvider), EmbeddingResponse shape, EmbeddingUsage record, embedding-specific
runtime config (initially minimal — dimensions for callers controlling output size +
the extras pass-through bag).
§3 EmbeddingProvider protocol¶
The EmbeddingProvider protocol exposes two async operations:
| Operation | Parameters | Returns |
|---|---|---|
ready() |
(none) | void / null |
embed(input, *, config?) |
input: list of strings; config: optional EmbeddingRuntimeConfig (keyword-only) |
EmbeddingResponse record |
Cross-impl contract:
- Per-instance model binding (matches llm-provider §5). Implementations bind one model identifier per provider instance via a constructor parameter (or per-language idiomatic equivalent).
ready()MUST be idempotent; surfacesprovider_invalid_model/provider_model_not_loadedper llm-provider §7 if the bound embedding model is not available.embed()MUST raise one of the llm-provider §7 error categories on failure (the §7 enumeration is shared cross-capability; embedding-applicable subset documented in §5 Error semantics below).- Input is always a list of strings — even single-string callers wrap as a one-element
list. Matches
complete()'s "always a list" message contract; cleaner type signature than overloading the input type. - The
configparameter is keyword-only (or per-language idiomatic equivalent that prevents positional confusion withinput).
§4 EmbeddingResponse and EmbeddingUsage shapes¶
An EmbeddingResponse record:
| Field | Description |
|---|---|
vectors |
List of vectors (each a list of floats); one vector per input string in the order the inputs were supplied. The length of vectors MUST equal the length of input. |
model |
The model identifier the provider returned. MAY be a more specific identifier than the one the provider was bound against. |
usage |
An EmbeddingUsage record (defined below). |
request_id |
The provider-returned request identifier when present; null otherwise. |
dimensions |
Int. The output vector dimensionality. MUST equal the length of each inner list in vectors; derivable from vectors[0] but kept on the response for ergonomics + cross-vendor consistency. |
raw |
The parsed provider response, as a language-idiomatic representation of deserialized JSON (Python: dict[str, Any]; TypeScript: Record<string, unknown>). MUST be populated on every successful return. Parallel to llm-provider §6 Response.raw. |
An EmbeddingUsage record:
| Field | Description |
|---|---|
input_tokens |
Int. Tokens billed for the embedding call. Always reported (no output_tokens — vectors aren't tokens). |
Cross-impl invariants:
- Exactly one vector per input string (the length of
vectorsmatches the length ofinput). - Vector position is keyed by input order; implementations MUST NOT permute.
- All vectors in a single response have the same dimensionality. Implementations MUST
verify this on the response and raise
provider_invalid_responseper llm-provider §7 if violated. - The
dimensionsfield on the response MUST equal the dimensionality of each inner vector — cross-check invariant for adapters.
§5 Error semantics¶
Inherits the llm-provider §7 error-category enumeration. The embedding-applicable subset is the §7 categories minus the LLM-completion-specific ones:
provider_authentication,provider_unavailable,provider_invalid_model,provider_model_not_loaded,provider_rate_limit,provider_invalid_response,provider_invalid_request— all applyprovider_unsupported_content_block— does NOT apply (embedding takes strings, not content blocks)structured_output_invalid— does NOT apply (embedding has noresponse_schema)
§6 Determinism¶
Embedding model determinism guarantees vary by provider; the spec MUST NOT assume bit-identical vectors for equivalent inputs. Embedding-aware caches keyed on input strings (e.g., the proposal 0047 prefix-cache analog) MAY apply per the provider's documented determinism guarantees but are NOT a spec contract.
§7 Cross-spec touchpoints¶
- graph-engine §6 — typed observer events
EmbeddingEvent+EmbeddingFailedEvent - observability §5.5 (OTel) + §8 (Langfuse) — backend mappings
- llm-provider §7 — error-category enumeration (inherited)
- pipeline-utilities §6 (middleware) —
EmbeddingProvidercalls are eligible for retry middleware identically tocomplete()calls
§8 Out of scope¶
- Multi-modal embedding (image / audio embeddings). Text-only in v1.
- Per-vendor wire-format mappings. Follow-on proposals add OpenAI / Cohere / Voyage / etc. mappings analogous to llm-provider §8.1 / §8.2 / §8.3.
- Per-SDK implementation details (httpx batching strategies, retry timing).
- Caller-supplied determinism / seeding (embedding models rarely expose seeds; not v1).
- Cross-call observability correlation (e.g., "this rerank call used vectors from that embedding call"). Each call is independent at the protocol layer; any correlation lives in node-body code, not in the protocol contract.
Extend graph-engine §6 with EmbeddingEvent + EmbeddingFailedEvent¶
Two new typed event variants on the observer event union, paired from day one per the
0049 → 0058 success+failure pairing precedent. Both follow the type-discrimination
pattern (isinstance(event, EmbeddingEvent) / isinstance(event, EmbeddingFailedEvent))
established by 0049 for the LLM-side variants.
EmbeddingEvent (success)¶
Mirrors the shape of LlmCompletionEvent's identity / scoping / request-side field
set per the 0057-extended baseline, with capability-specific substitutions
(input_strings in place of input_messages). Embedding-specific success-side fields
replace the LLM-completion-specific ones:
| Field | Type | Description |
|---|---|---|
invocation_id |
string | The outer invocation's identifier, per observability §5.1. |
correlation_id |
string | null | Cross-backend correlation ID, per observability §3.1. |
node_name |
string | The user-defined node that issued the call. |
namespace |
sequence of strings | The calling node's namespace, per the Node event shape above. |
attempt_index |
int | The retry-attempt index (0 on the first attempt). |
fan_out_index |
int | null | The fan-out instance index when the calling node ran inside a fan-out instance (per pipeline-utilities §9). Null otherwise. |
branch_name |
string | null | The parallel-branches branch name when the calling node ran inside a parallel-branches branch (per pipeline-utilities §11). Null otherwise. |
provider |
string | The embedding provider identifier (matches gen_ai.system per observability §5.5.3). |
model |
string | The model identifier the request was made against. |
response_model |
string | null | The model identifier the provider returned in the response (matches gen_ai.response.model). May be more specific than requested; null when the provider doesn't return a response model. |
response_id |
string | null | The provider-returned response identifier, when present. |
usage |
record | null | EmbeddingUsage record per retrieval-provider §4. May be null when the provider does not report usage. |
latency_ms |
float | null | Wall-clock latency of the embedding call measured at the adapter boundary, in milliseconds. May be null when latency is not measured. Implementations MAY use a provider-reported latency value when the provider surfaces one, documenting which source is in use. |
finish_reason |
n/a | Embedding has no completion semantics; field omitted from this variant. |
caller_invocation_metadata |
mapping | null | OPTIONAL field; same opt-in semantics as on LlmCompletionEvent. |
input_strings |
list of string | The input strings the embedding call was made with, in the typed-event-native form. Populated unconditionally on every typed event; observer-side privacy gating applies at the rendering boundary per the privacy paragraph below. |
request_params |
mapping | Embedding-specific EmbeddingRuntimeConfig fields the caller supplied (dimensions, etc.). Absence-is-meaningful semantics per the equivalent field on LlmCompletionEvent. Empty mapping when no parameters were supplied. |
request_extras |
mapping | The EmbeddingRuntimeConfig extras pass-through bag — vendor-specific knobs. Same shape and privacy posture as on LlmCompletionEvent. |
active_prompt |
record | null | A snapshot of the active Prompt identity at embedding-call time (RAG pipelines often render a prompt template before embedding for chat-shaped search). Same field set and nullability as on LlmCompletionEvent. |
active_prompt_group |
record | null | Snapshot of the active PromptGroup identity. Same shape as on LlmCompletionEvent. |
call_id |
string | A per-call disambiguator minted by the implementation. Always present (never null); freshly minted per embed() call. |
input_count |
int | The number of input strings the call was made with (equals len(input_strings)). Derivable but kept for ergonomics + cross-vendor consistency. |
dimensions |
int | null | The dimensionality of the returned vectors (equals the inner-vector length from the response). May be null when the response does not surface a determinate dimensionality. |
EmbeddingFailedEvent (failure)¶
Same shape as LlmFailedEvent (proposal 0058) — mirrors the identity / scoping /
request-side field set from EmbeddingEvent, with embedding-specific success-side
fields (response_id, response_model, usage, dimensions, input_count) absent
(no vectors were returned), and adds three failure-specific fields:
| Field | Type | Description |
|---|---|---|
invocation_id |
string | The outer invocation's identifier, per observability §5.1. |
correlation_id |
string | null | Cross-backend correlation ID, per observability §3.1. |
node_name |
string | The user-defined node that issued the call. |
namespace |
sequence of strings | The calling node's namespace, per the Node event shape above. |
attempt_index |
int | The retry-attempt index (0 on the first attempt). |
fan_out_index |
int | null | The fan-out instance index when the calling node ran inside a fan-out instance (per pipeline-utilities §9). Null otherwise. |
branch_name |
string | null | The parallel-branches branch name when the calling node ran inside a parallel-branches branch (per pipeline-utilities §11). Null otherwise. |
provider |
string | The embedding provider identifier. |
model |
string | The model identifier the request was made against. |
latency_ms |
float | null | Wall-clock latency from embed() entry to the point the failure was raised, in milliseconds. May be null when latency is not measured. |
caller_invocation_metadata |
mapping | null | OPTIONAL field; same opt-in semantics as on EmbeddingEvent. |
input_strings |
list of string | The input strings the embedding call was made with. Populated unconditionally on every typed event; same observer-side privacy-gating posture as on EmbeddingEvent. |
request_params |
mapping | Embedding-specific config fields the caller supplied. Same shape as on EmbeddingEvent. |
request_extras |
mapping | The EmbeddingRuntimeConfig extras pass-through bag. Same shape and privacy posture as on EmbeddingEvent. |
active_prompt |
record | null | Snapshot of the active Prompt identity at embedding-call time. Same shape as on EmbeddingEvent. |
active_prompt_group |
record | null | Snapshot of the active PromptGroup identity. Same shape as on EmbeddingEvent. |
call_id |
string | A per-call disambiguator minted by the implementation. Always present; freshly minted per embed() call. A failed call gets its own call_id, distinct from any retry-attempt sibling. |
error_category |
string | One of the llm-provider §7 normative categories applicable to embedding (per retrieval-provider §5 above). Always present. |
error_type |
string | null | OPTIONAL impl-level / vendor-specific error type or code. Two acceptable styles (vendor error code, upstream exception class name). Null when no impl-side type is available. |
error_message |
string | Human-readable message from the raised exception. Always present (empty string when the exception carried no message). |
Mutual exclusion + exception-flow + dispatch timing¶
Same rules as 0058's LlmCompletionEvent / LlmFailedEvent pair:
EmbeddingEventandEmbeddingFailedEventare mutually exclusive on a givenembed()call. Implementations MUST NOT emit both for the same call.- The §7 category exception still raises out of
embed()per llm-provider §7 — whether raised by the provider or by the implementation's pre-send validation layer. The typed event is dispatched alongside the exception, not in place of it. - Both events MUST be dispatched on the observer delivery queue at the point of
embed()completion or failure (after success / after exception is raised; before the call returns or re-raises to the caller). Delivery semantics follow graph-engine §6 — strict- serial across the invocation, async-delivered.
Privacy posture¶
input_strings and request_extras carry potentially sensitive payload data
(RAG-pipeline indexing of user-supplied text). The privacy posture is identical to
LlmCompletionEvent's — observer-side gating at the rendering boundary per observability
§5.5.4 (implementations populate the fields unconditionally; observers honor
disable_provider_payload). The disable_provider_payload flag's semantics extend to cover all
LLM-adjacent provider operations (LLM completion + embedding + rerank when it ships)
rather than proliferating per-operation flags.
Rename observability §5.5.4 disable_llm_payload → disable_provider_payload¶
The observer-level flag defined at observability §5.5.4 currently gates LLM payload
data — input.messages, output.content, request.extras per §5.5.1 — from being
populated on OTel spans and Langfuse Generation observations. With the addition of
embedding (this proposal) and forthcoming rerank as provider operations whose payload
needs the same gating, the flag's _llm_ infix is too narrow.
Rename the flag from disable_llm_payload to disable_provider_payload. Semantics
broaden to cover payload data from any provider call (LLM completion + embedding +
rerank when it lands), under a single observer-level flag with default True
(suppressed by default) — same default-conservative posture as today. No semantic
change beyond the broadened scope; existing LLM-payload gating behavior is preserved
unchanged for LlmCompletionEvent + LlmFailedEvent (proposals 0049 / 0057 / 0058).
Spec text edits at Accept time:
- observability §5.5.4 renames the flag definition + extends the framing to cover provider-payload data across LLM + embedding + rerank uniformly
- observability §8 / §8.4 references update to the new name
- graph-engine §6's
LlmCompletionEvent+LlmFailedEventprivacy paragraphs update references - Existing fixtures using the flag rename in their YAML (the OTel
disable_llm_payloadtoggle is exercised by fixtures 013 + 018; both swap the key)
Pre-1.0 SemVer convention permits the hard-swap rename in a MINOR bump; same precedent
as proposal 0057's request_id → response_id field rename. The CHANGELOG entry calls
out the rename in the Changed section so any downstream observer config with
disable_llm_payload=True has operator-awareness for the one-key update.
Extend observability §5.5 (OTel) with embedding mapping¶
A new §5.5.X Embedding provider attributes sub-subsection paralleling the existing §5.5 LLM provider attributes block.
Span name — openarmature.embedding.complete (parallel to the existing
openarmature.llm.complete span name for LLM completions). Span name discriminates the
operation type without requiring an explicit operation-name attribute. Parent is the
calling node's span.
Stable GenAI semconv attributes (mapped where they apply directly):
gen_ai.system←EmbeddingProvider's configured provider identifier (e.g.,"openai","voyageai","cohere")gen_ai.request.model← bound embedding model identifiergen_ai.response.model←EmbeddingResponse.model(provider-echoed)gen_ai.response.id←EmbeddingResponse.request_id(when present)gen_ai.usage.input_tokens←EmbeddingResponse.usage.input_tokens
OA-namespace attributes:
openarmature.embedding.input_count— int — number of input stringsopenarmature.embedding.dimensions— int — output vector dimensionalityopenarmature.embedding.input.strings— JSON-encoded list of input strings (subject todisable_provider_payload, parallel to §5.5.1)openarmature.embedding.request.extras— JSON-encoded extras mapping (subject todisable_provider_payload)
Stable-only upstream adoption — operation-name attribute deferred. The upstream OTel
GenAI semconv gen_ai.operation.name attribute (with "embeddings" as a documented
well-known value) is at Development status as of OTel semconv (verified at draft
time); per the Stable-only upstream adoption policy in GOVERNANCE.md (and tracked
in docs/compatibility.md), OA does NOT normatively adopt this attribute in v1.
Operation discrimination is via the span name + provider; a follow-on proposal MAY add
gen_ai.operation.name = "embeddings" to the attribute surface when the upstream
attribute reaches Stable status, per the §5.5.3.1 / 0047 mirror pattern.
The disable_llm_spans / disable_provider_payload / disable_genai_semconv flags apply
analogously to embedding spans.
A new §5.5.X Typed embedding events sub-subsection frames the
EmbeddingEvent + EmbeddingFailedEvent typed-event surface as the structured form of
the embedding attribute surface, paralleling §5.5.7 for LLM completion events.
Extend observability §8 (Langfuse) with embedding mapping¶
A new §8.X Embedding observation mapping sub-subsection. Embedding calls map onto
Langfuse's dedicated Embedding observation type (created via the SDK's
asType: "embedding" parameter or per-language idiomatic equivalent), NOT Generation.
Field mappings:
embedding.model←EmbeddingResponse.modelembedding.input←input_strings(privacy-gated perdisable_provider_payload— see Privacy posture for embedding observations below)embedding.output←EmbeddingResponse.vectors(the actual embedding vectors; privacy-gated perdisable_provider_payload)embedding.usageDetails.input←EmbeddingResponse.usage.input_tokensembedding.metadata.openarmature_input_count,embedding.metadata.openarmature_dimensions,embedding.metadata.openarmature_request_id(when present)
Trace-level cost rollup aggregates across LLM Generation + Embedding observations
uniformly — Langfuse's cost-tracking machinery understands the Embedding type's
usageDetails field directly. No metadata discriminator is needed; the observation type
itself discriminates.
Privacy posture for embedding observations. Both input strings and output
vectors are payload-bearing data on the same footing — both gated by
disable_provider_payload (default True per observability §5.5.4 — see Rename below
for the cross-spec flag rename folded into this proposal). When the flag is True, the
Embedding observation populates model + usageDetails + identity metadata only;
both input and output are NOT populated. When False, both fields populate fully.
Vectors are classified as payload-bearing because embedding-inversion research (e.g., the vec2text line of work, Morris et al., 2023) demonstrates that vectors MAY leak source-text information given the embedding model. The threat model for vectors is equivalent to the threat model for raw text from the spec's perspective; gating applies uniformly. RAG applications in particular have a corpus-leakage concern — the (text, vector) pairs accumulated in traces would let an attacker reconstruct the embedding index and query it offline. Default-suppression is the conservative posture.
A future observability proposal MAY introduce a tiered preview mode (e.g., truncated
input strings + first-N-dimensions vectors) for users wanting partial visibility
without full payload exposure. Out of scope for this proposal.
Conformance test impact¶
New fixtures under spec/retrieval-provider/conformance/¶
A new directory for the retrieval-provider capability's own protocol-level fixtures:
001-embed-positive-control— BoundEmbeddingProviderwith a mocked provider that returns 3 vectors for 3 input strings. Asserts response shape: the length ofvectorsmatches the length ofinput, all inner vectors have the same length, thedimensionsfield matches the inner-vector length,usage.input_tokenspopulated,request_idpopulated.002-embed-model-binding-error—EmbeddingProviderinstantiated with an unknown model id;ready()raisesprovider_invalid_model. Verifies model-binding contract per retrieval-provider §3 / llm-provider §7.003-embed-malformed-response-mismatched-vector-count— Provider returns 2 vectors for 3 input strings;embed()raisesprovider_invalid_responseper §5.004-embed-malformed-response-inconsistent-dimensions— Provider returns 3 vectors with inconsistent inner-list lengths;embed()raisesprovider_invalid_response.005-embed-input-order-preserved— Input order MUST be preserved in output vector order. Mocked provider returns identifiable vectors; assert output indexes match input indexes.
New fixtures under spec/observability/conformance/¶
Parallel to the 050-073 LLM-event fixtures, covering both EmbeddingEvent and
EmbeddingFailedEvent. Final fixture numbers assigned at acceptance; the rough block
is 074-085:
07X-embedding-event-dispatch— Successfulembed()call dispatchesEmbeddingEventwith the full field set populated. Mirrors 050 for LLM.07X-embedding-failure-event-dispatch-on-provider-unavailable— Failedembed()call dispatchesEmbeddingFailedEvent; exception still raises. Mirrors 069.07X-embedding-event-mutual-exclusion— Successful call emits exactly oneEmbeddingEventand zeroEmbeddingFailedEvent; failed call emits exactly oneEmbeddingFailedEventand zeroEmbeddingEvent. Mirrors 072.07X-embedding-event-call-id-distinct— Multipleembed()calls in an invocation emitEmbeddingEvents with distinctcall_idvalues. Mirrors 067.07X-embedding-event-input-strings-populated—input_stringsfield carries the input list verbatim. Mirrors 060.07X-embedding-event-request-params-populated—request_paramscarriesdimensionswhen supplied; absence-is-meaningful for other params. Mirrors 062.07X-embedding-event-input-count-and-dimensions-populated— Convenience fields match the input list length and the inner-vector length from the response.07X-embedding-event-active-prompt-populated— Embedding call inside a prompt- context binding carriesactive_promptsnapshot. Mirrors 064.07X-otel-embedding-span-attributes— OTel span emitted with span nameopenarmature.embedding.completeand the Stable GenAI semconv attribute subset (gen_ai.system,gen_ai.request.model,gen_ai.response.model,gen_ai.response.id,gen_ai.usage.input_tokens) plus the OA-namespace embedding attributes. Asserts the upstreamgen_ai.operation.nameattribute is NOT emitted in v1 per the stable-only deferral.07X-langfuse-embedding-observation— Langfuse dedicatedEmbeddingobservation (created viaasType: "embedding"per Langfuse SDK convention), withmodel+usageDetails.input+ identity metadata populated. Two cases: (1) withdisable_provider_payload=True(default) —inputandoutputNOT populated; (2) withdisable_provider_payload=False—inputcarries the strings list,outputcarries the full vectors. Asserts the observation type isEmbedding(notGeneration).
Versioning¶
MINOR bump (pre-1.0). On acceptance:
- New
retrieval-providercapability — purely additive; no existing capability touched - Two new typed event variants on the graph-engine §6 observer event union —
EmbeddingEvent,EmbeddingFailedEvent - New §5.5.X / §8.X subsections in observability spec — additive
- One breaking-name rename: observer-level privacy flag
disable_llm_payload→disable_provider_payload(per the Rename observability §5.5.4 section above). Pre-1.0 SemVer convention permits the rename in a MINOR bump; the CHANGELOG entry carries the rename in Changed for downstream operator-awareness. - ~5 fixtures under
spec/retrieval-provider/conformance/(new directory) + ~10 fixtures underspec/observability/conformance/+ 2 existing observability fixtures rename one config key (thedisable_llm_payloadflag in fixtures 013 + 018)
Observers consuming only LlmCompletionEvent / LlmFailedEvent continue to work
unchanged at the typed-event surface; one observer-config key change is required to
adopt v0.54.0 (disable_llm_payload → disable_provider_payload).
Alternatives considered¶
-
Extend
llm-providerwithembed()on the sameProviderprotocol. Reject — contradicts per-model-binding (aProviderinstance can't simultaneously be bound togpt-4o-miniANDtext-embedding-3-small; the existing llm-provider §5 contract is one instance, one model). The single-protocol-with-overloaded-binding shape would break invariants other capabilities depend on (cross-cuttinggen_ai.request.modelspan attribute; per-instance retry config; per-model conformance fixtures). -
Single unified
Providerwithembed()+complete()+rerank()methods. Reject — the real provider landscape is fragmented (OpenAI does complete+embed, Voyage does embed+rerank, Anthropic only complete, Cohere all three). A unified protocol forces every implementation to stub out methods its backend doesn't serve. Separate protocols let each implementation declare exactly what its backend supports. -
Name the success event
EmbeddingCompletionEventfor cross-family symmetry withLlmCompletionEvent. Reject — "Completion" is the LLM term of art (one model = many tokens completing a response); embedding doesn't "complete" semantically. The shorterEmbeddingEventreads more accurately at the cost of less mechanical cross-family symmetry. Same call applies to the forthcoming rerank variant (RerankEvent, notRerankCompletionEvent). -
Ship the success-side event variant only; defer the failure-side typed variant. Reject — the 0049 → 0058 split was a real cost. Observers had a typed-event surface for the success path and a sentinel-namespace
NodeEventfor the failure path, defeating the type-discrimination contract on the failure side and requiring two release cycles to complete the typed-event coverage. Shipping both variants together from launch avoids that split. -
Ship with a minimal field set; extend later via a follow-on proposal. Reject — the 0049 → 0057 extension was a similar cost. The typed event landed minimally at v0.41.0, then needed re-extension to add request-side / prompt-identity / per-call disambiguator fields at v0.51.0. Shipping the full 0057-extended baseline from launch skips that beat.
-
Two separate capabilities —
embedding-provider+rerank-provider. Reject — over-fragmentation. Embedding and rerank are both retrieval primitives; a singleretrieval-providercapability with two protocol surfaces is the cleaner home. The sibling protocol-shape framing parallels howharnessis one capability with per-harness-type sub-specs (chat, FastAPI, etc.). -
Define
disable_embedding_payloadas a separate observer-side privacy flag; keepdisable_llm_payloadfor LLM completion. Reject — privacy-flag proliferation. Each new capability would get its own per-operation flag, ratcheting the observer-config surface as the spec grows. A single renameddisable_provider_payloadcovers all provider-payload data under one mental model. -
Ship 0059 using the existing
disable_llm_payloadflag name unchanged; carry the naming mismatch. Reject —disable_llm_payload=Truecovering non-LLM embedding payload reads as an outright bug to any reader. Renaming now (pre-1.0, folded into this proposal) is cheap; deferring locks in the mismatch through the impl cycle absorbing this work and accumulates downstream config-key technical debt. -
Map embeddings onto Langfuse's
Generationobservation type with ametadata.operation = "embedding"discriminator. Reject — verified against Langfuse docs at draft time that Langfuse exposes a dedicatedEmbeddingobservation type (10 observation types currently: Event, Span, Generation, Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail). The dedicated type carries the semantic accuracy + cost-rollup integration directly; theGeneration-with-discriminator framing was an outdated assumption from an earlier Langfuse data model.
Open questions¶
None at draft time. All design choices are settled in the proposal text above:
- Capability home — new
spec/retrieval-provider/capability (per the user-aligned direction; not extending llm-provider, not separate-capability-per-protocol). - Protocol shape —
EmbeddingProvideris a sibling protocol toProvider, with its ownready()+embed()interface and per-model binding. - Event variant naming —
EmbeddingEvent+EmbeddingFailedEvent; "Completion" suffix dropped for accuracy over cross-family symmetry. - Success+failure pairing — both event variants ship in this proposal from day one; no follow-on split.
- Field set scope — request-side / prompt-identity / per-call disambiguator fields included from launch per the 0057 precedent; no follow-on extension cycle planned.
- Privacy posture —
disable_provider_payload(renamed fromdisable_llm_payload; see Rename observability §5.5.4 in Proposed change) semantics cover embedding payload including vectors. Vectors are payload-bearing because embedding-inversion research (vec2text and related lines of work) demonstrates that vectors MAY leak source-text information given the embedding model — the threat model is equivalent to raw text from the spec's perspective; gating applies uniformly. A future observability proposal MAY introduce a tiered preview mode (truncated strings + first-N-dimensions vectors) for partial-visibility use cases; out of scope here. - Langfuse observation type — dedicated
Embeddingtype (created viaasType: "embedding"), NOTGenerationwith operation metadata. Verified against current Langfuse docs at draft time — Langfuse exposes 10 observation types including a first-classEmbeddingtype withmodel+usageDetails+input+outputfields tailored to embedding calls. The dedicated type carries both semantic accuracy and the cost-rollup integration cleanly. disable_llm_payloadrename — folded into this proposal (Option B from the drafting discussion). Pre-1.0 SemVer convention permits the hard-swap rename in a MINOR bump; same precedent as proposal 0057'srequest_id→response_idrename. The new name avoids a naming-mismatch transition window where the embedding-payload flag would be calleddisable_llm_payloaddespite covering non-LLM payload.- Error categories — inherited from llm-provider §7; embedding-applicable subset documented in retrieval-provider §5.
- Sequencing — rerank protocol lands in a follow-on proposal extending the same capability; this proposal scopes to the capability scaffold + embedding only.
- Stable-only upstream adoption for
gen_ai.operation.name— the upstream OTel GenAI semconvgen_ai.operation.nameattribute (with"embeddings"as a documented well-known value) is at Development status as of draft time (verified against the OTel GenAI spans semantic conventions). Per theStable-only upstream adoptionpolicy inGOVERNANCE.md, OA does NOT normatively adopt this attribute in v1. Operation discrimination is via span name + provider; the attribute MAY be added in a follow-on when upstream reaches Stable. Same mirror pattern as proposal 0047's cache-attribute deferral.
Out of scope¶
- Rerank protocol — lands in a forthcoming proposal extending the same
retrieval-providercapability withRerankProvider+ pairedRerankEvent/RerankFailedEvent. - Multi-modal embedding (image embeddings, audio embeddings). Text-only in v1; a follow-on proposal can scope multi-modal if downstream demand surfaces.
- Per-vendor wire-format mappings. Follow-on proposals add concrete vendor mappings (OpenAI, Cohere, Voyage, Jina) analogous to llm-provider §8.1 / §8.2 / §8.3.
- Per-SDK implementation details — httpx batching strategies, embedding-layer retry timing, SDK-specific error mapping. Provider-internal choices.
- Caller-supplied determinism / seeding. Embedding models rarely expose seeds; not v1. A follow-on can scope if demand surfaces.
- Cross-call observability correlation (e.g., "this rerank call used vectors from that embedding call"). Each call is independent at the protocol layer; any cross-call correlation lives in node-body code.
- Embedding result caching at the framework level. Caching is an application concern; the framework MAY ship cache middleware in a follow-on proposal, but caching is not a protocol-layer contract.
- A typed-event surface for streaming embeddings. Some providers stream embeddings for very long inputs; not v1. A follow-on can add streaming-event variants per the shape forthcoming streaming proposals establish for LLM completions.