LLM Provider¶
Canonical behavioral specification for the OpenArmature LLM provider abstraction.
- Capability: llm-provider
- Introduced: spec version 0.4.0
- History:
- created by proposal 0006
- §3 Message shape extended (user content MAY be a sequence of content blocks); §3.1 Content blocks added (text and image blocks; image input only on user messages); §7 gained
provider_unsupported_content_blockerror category; §8.1 user-row updated and §8.1.1 content-block wire mapping added; §10 multi-modal entry split (image input now covered; audio/video and image outputs remain deferred) by proposal 0015 - §5
complete()extended with optionalresponse_schemaparameter; §6 Response gainedparsedfield; §7 gainedstructured_output_invaliderror category (non-transient by default); §8.5 structured output wire mapping added (with §8.5.1 prompt-augmentation fallback and §8.5.2 response mapping); §10 structured output deferral removed by proposal 0016 - §8 renamed from "OpenAI-compatible wire format" to "Wire-format mappings" and reorganized as a catalog of provider mappings; existing OpenAI-compatible body nested under new §8.1 "OpenAI-compatible mapping" (subsections §8.1 through §8.5 → §8.1.1 through §8.1.5); §8 framing paragraph added establishing the default placement rule (in-spec for any mapping with multi-language ambition; out-of-tree allowed only for single-language / opt-out / experimental cases) by proposal 0019
- §5
complete()extended with optionaltool_choiceparameter (four modes:"auto"/"required"/"none"/{type: "tool", name: X}) with pre-send validation routing throughprovider_invalid_request; §7 clarified to enumerate the three new validation failure modes; §8.1.1 gained atool_choicemapping row by proposal 0025 - §8 framing gained a Per-mapping subsection structure paragraph recommending the canonical §8.X template (Request mapping / Response mapping / Error mapping / Concurrency / Structured output) with allowance for sub-subsections, provider-specific top-level additions, and SHOULD-level divergence-explanation requirement; resolves 0019's open-question #2 by proposal 0026
This specification is language-agnostic. Each implementation (Python, TypeScript, …) maps its own idioms
onto the behavioral contract described here. Conformance is verified by the fixtures under conformance/.
Normative keywords (MUST, MUST NOT, SHOULD, MAY) are used per RFC 2119.
1. Purpose¶
The LLM provider capability defines a uniform request/response surface for sending messages to a Large Language Model and receiving its response. It is the substrate every higher-level LLM capability composes against — tool systems, prompt management, evaluation harnesses, agent loops.
The substrate is intentionally narrow:
- A provider is stateless. It does not maintain conversation history; the caller passes the full message list on every call.
- A provider does not loop on tool calls. If the assistant returns tool calls, the caller is
responsible for executing the tools and making a follow-on
complete()with the results. - A provider does not handle retry, rate limiting, fallback, or routing. Those are pipeline- utilities concerns and compose above the provider via middleware.
- A provider is bound to a single model identifier. Switching models means constructing a new provider, not passing a different argument per call. (Implementations MAY offer convenience factories that produce per-model providers from shared credentials; that is a constructor concern, not a behavioral one.)
Every constraint above is a deliberate scope cut. The narrower the provider surface, the easier it is to swap implementations, mock for tests, and stack pipeline utilities on top.
Transparency. Per charter §3.1 principle 8 ("Transparency over abstraction"), the provider
abstraction surfaces a normalized shape — Message, Tool, Response — without hiding what the
underlying provider returned. The Response record carries the parsed provider response verbatim
alongside the normalized fields (§6 raw), and the §7 error categories preserve the underlying
provider exception as cause. Users who need provider-specific fields (logprobs, content-filter
details, vendor-specific extensions) reach through the abstraction directly; structure is added,
never removed.
2. Concepts¶
Message. A typed entry in a conversation. The four message kinds are system, user,
assistant, and tool. Each kind carries kind-specific content as defined in §3.
Tool. A function the model may request the user execute. A tool definition is a record of name,
description, and parameters (a JSON Schema describing the argument shape).
Tool call. A request from an assistant message to invoke a named tool with structured arguments.
The user is responsible for executing the tool and returning the result via a tool message bearing
the corresponding tool_call_id.
Provider. An object that, given a sequence of messages and an optional set of tools, returns a
single assistant message wrapped in a Response. A provider is bound to a specific model identifier.
Response. The result of a provider call: the assistant message, a finish reason, and usage information.
3. Message shape¶
A message is a record with the following fields:
| Field | Required | Description |
|---|---|---|
role |
yes | One of "system", "user", "assistant", "tool". Discriminator. |
content |
conditional (see below) | Text content of the message, OR — for user role only — a non-empty ordered sequence of content blocks per §3.1. |
tool_calls |
only on assistant |
Ordered list of ToolCall records the model is requesting. |
tool_call_id |
required on tool |
The id of the matching assistant tool call. |
Per-role constraints:
system:contentMUST be a non-empty string.tool_callsMUST be absent.tool_call_idMUST be absent.user:contentMUST be one of:- a non-empty string (text-only message), OR
- a non-empty ordered sequence of content blocks (per §3.1).
tool_calls MUST be absent. tool_call_id MUST be absent.
- assistant: tool_calls MAY be present. If tool_calls is present and non-empty, content MAY
be empty (the assistant is purely calling tools); if tool_calls is absent or empty, content
MUST be a non-empty string. tool_call_id MUST be absent.
- tool: content MUST be a string (the tool's textual result; serialize structured results to a
string at the call boundary). tool_call_id MUST be present and MUST match the id of an
assistant ToolCall earlier in the message list. tool_calls MUST be absent.
A ToolCall record:
| Field | Description |
|---|---|
id |
String identifier, unique within the message. The matching tool message bears this id as tool_call_id. For provider-returned tool calls, implementations MUST preserve the provider's id verbatim — neither rewriting nor normalizing it. Ids are opaque correlators within a single message list; preserving the original lets users correlate with provider-side logs/billing and persists naturally as conversations are stored, replayed, or routed. |
name |
The tool name. MUST match a Tool.name declared in the call's tools argument under non-error responses; on finish_reason: "error", an unmatched name MAY appear (see below). |
arguments |
A JSON-serializable mapping of argument names to values. Under non-error responses, MUST be a parsed mapping conforming to the tool's parameters schema. Under finish_reason: "error", MAY be null (the implementation could not parse the provider's bytes as JSON) or a parsed mapping that does not conform to the schema. |
Validation timing. Implementations MUST validate message-shape constraints (per-role required
fields, tool_call_id matching, etc.) at the boundary of complete() — before sending to the
provider, and on the response before returning. Tool argument validation against the parameters
schema happens at the same boundaries; under non-error responses, a malformed assistant ToolCall
from the provider raises provider_invalid_response (§7).
Validation under finish_reason: "error". A degraded response MAY carry tool_calls, and
those tool calls MAY be partially constructed: malformed argument JSON (truncated, syntactically
invalid), arguments that don't match the parameters schema, or unmatched name. Implementations
MUST NOT raise provider_invalid_response in this case — the partial response is the response.
The implementation surfaces what it could parse:
- Tool calls with parseable JSON arguments populate
argumentsas a mapping (whether or not it matches the schema). - Tool calls with unparseable arguments populate
argumentsasnull. The original bytes are available verbatim viaResponse.raw. - Tool calls with missing or unknown
nameare still surfaced.
Callers iterating tool_calls after a successful (non-error) complete() can rely on validated
arguments. Callers handling finish_reason: "error" SHOULD inspect each tool call before
executing — argument repair (parsing partial JSON, completing truncated braces) is an application
concern, performed against Response.raw for the original bytes. The spec deliberately surfaces
malformed data rather than dropping it, so applications can repair-and-continue.
Cross-provider id round-tripping. A conversation MAY traverse multiple providers within a
single application — for example, behind an LLM gateway / router that applies fallback strategies
across providers, or when an application explicitly switches providers between conversation rounds.
Tool-call ids are opaque correlators within the message list, not provider-side references;
providers accept arbitrary id strings on inbound requests and only verify that subsequent
tool_call_id values match earlier tool calls in the same conversation. Because implementations
preserve provider-supplied ids verbatim (per the id field rule above), message lists round-trip
across providers cleanly without id rewriting. Applications that need a unified internal id format
MAY rewrite ids at their own boundary; the spec keeps the abstraction transparent and leaves that
choice to the application.
3.1 Content blocks¶
A content block is a typed record with a discriminator field identifying the block type. v1 defines two block types: text and image.
3.1.1 Text block¶
A text block is a record:
| Field | Required | Description |
|---|---|---|
type |
yes | The literal string "text". |
text |
yes | A non-empty string. |
A text block is the content-array equivalent of the text-string form. A user message containing
exactly one text block with text T is normatively equivalent to a user message with
content: T.
3.1.2 Image block¶
An image block is a record:
| Field | Required | Description |
|---|---|---|
type |
yes | The literal string "image". |
source |
yes | One of url or inline (per §3.1.3). |
media_type |
conditional | Required when source is inline; ignored when source is url (the provider infers the media type from the URL's payload). Implementations MUST accept the IANA media types image/png, image/jpeg, and image/webp at minimum, and MAY accept additional image/* media types they document support for. Portable users SHOULD restrict to the three guaranteed types. |
detail |
optional | A hint to the provider about the desired image-processing fidelity. One of "auto", "low", "high". Default is "auto". Providers that do not honor a detail hint MUST ignore it without error. |
3.1.3 Image source¶
The source field on an image block carries one of two variants:
url— the image is referenced by a URL:{ type: "url", url: <string> }. The URL MAY behttp(s)://,data:(RFC 2397 inline data URI), or another scheme the provider documents support for. Implementations MUST pass the URL through to the wire unchanged; the spec does not mandate fetching, caching, or transforming URL-form images.inline— the image is provided as base64-encoded bytes:{ type: "inline", base64_data: <string> }. Themedia_typefield on the image block (§3.1.2) MUST be present for inline images. Implementations MUST NOT inspect, transcode, or re-encode the bytes; they pass through to the wire encoded as the provider's wire format expects (§8.1.1.1).
A single image block carries exactly one source — url XOR inline. The discriminator is
the type field on the source itself.
3.1.4 Mixing blocks¶
A user message MAY mix text and image blocks freely. The wire format preserves block order; providers vary in whether they treat block order as semantically meaningful (e.g., "image appearing before its describing text" vs. "image after"), so application code SHOULD construct the block sequence in the order it wants the model to perceive it.
A content-block sequence MUST NOT be empty (per the §3 per-role constraint). A content-block sequence consisting entirely of text blocks is valid (and is the multi-text-block shape some applications prefer for prompt-composition reasons).
4. Tool definition¶
A Tool record:
| Field | Description |
|---|---|
name |
String identifier. MUST be unique within a single complete() call's tools list. |
description |
String describing the tool's behavior. Sent to the model. |
parameters |
A JSON Schema (object schema) describing the argument record. MUST be a valid JSON Schema; implementations SHOULD validate at call time. |
The parameters field is JSON Schema, not a language-native schema type. This keeps the spec
provider-agnostic (every supported wire format expects JSON Schema) and language-agnostic
(implementations may offer ergonomic constructors that compile from native types into JSON Schema —
e.g., Pydantic's model_json_schema(), Zod's zod-to-json-schema — but the spec surface is JSON
Schema regardless).
5. Provider interface¶
A provider MUST expose the following operations:
ready()¶
Async. Verifies that the bound model is reachable and serving — i.e., that the next complete()
call is expected to succeed. A successful return MUST imply that complete() would not raise any
of the §7 categories that surface mismatched configuration or unloaded state
(provider_authentication, provider_invalid_model, provider_model_not_loaded,
provider_unavailable). Raises one of the §7 categories on failure.
For hosted APIs this typically means credentials are valid, the base URL is reachable, and the
model is in the provider's catalog. For local servers (vLLM, LM Studio, llama.cpp),
this additionally means the model is loaded into memory and ready to serve — not just
configured. Implementations SHOULD distinguish these by raising provider_invalid_model when the
model is unknown to the provider versus provider_model_not_loaded when the model is known but
not yet serving (see §7).
Implementations SHOULD make this operation idempotent and inexpensive — a GET /models-style
probe is RECOMMENDED for hosted APIs; for local servers, a server-specific health endpoint that
distinguishes "model in registry" from "model loaded" SHOULD be preferred over a no-op
complete().
ready() is a pre-flight check intended for fail-fast on startup or warmup polling. It MUST NOT
be called automatically by complete(); callers decide when (or whether) to invoke it.
complete(messages, tools=None, config=None, response_schema=None, tool_choice=None)¶
Async. Performs a single completion call. When response_schema is supplied, the call
additionally constrains the model's output to conform to the schema. When tool_choice is
supplied, the call additionally constrains the model's tool-calling behavior.
messages— non-empty ordered sequence of messages. The first message MAY besystem; otherwise the message list begins withuser. The last message before the call MUST beuserortool(the request to the model). Implementations MUST validate this ordering; violations raiseprovider_invalid_request(§7).tools— optional ordered sequence ofToolrecords. When present and non-empty, the model is permitted to returntool_calls. Tool names MUST be unique within the list.config— optionalRuntimeConfig(§6). Per-call sampling parameters and budget hints.response_schema— optional JSON Schema describing the expected output shape. WhenNone/ absent, the call behaves as in v0.4.0: free-form text content; no parsed value. When present, MUST be a valid JSON Schema. The top-level schema MUST be an object schema (type: "object"at the root) — this matches §4Tool.parametersand OpenAI's strict-mode wire format. Non-object top-level schemas are out of scope for this version; a follow-on MAY relax this if cross-provider demand warrants. Implementations SHOULD validate at call time. The JSON Schema convention matches §4 — see §4's note on language-native schema constructors compiling to JSON Schema.tool_choice— optional tool-choice constraint. One of:"auto"— the model decides whether to call tools. Equivalent to the no-tool_choicedefault behavior whentoolsis non-empty; withtoolsempty / absent, the model has no tools to call regardless."required"— the model MUST return at least one tool call.toolsMUST be non-empty whentool_choiceis"required"; violations raiseprovider_invalid_request(§7) at pre-send validation."none"— the model MUST NOT call tools, even iftoolsis supplied. Useful for guarded LLM calls or for explicitly disabling tool-calling on a per-call basis without constructing a tools-less request.{type: "tool", name: <string>}— the model MUST call the named tool (and no other). The named tool MUST appear in the suppliedtoolslist; violations raiseprovider_invalid_request(§7) at pre-send validation. (toolsMUST be non-empty in this case, by transitivity.)
Default is None / absent. When tool_choice is None / absent, the engine MUST omit the
wire-level tool_choice field — the provider's own default applies. This preserves the
v0.4.0 behavior exactly (no wire-shape change for callers who don't supply tool_choice).
The discriminated-union shape (three string literals plus one record form) is described
abstractly; per-language ergonomics decide the type (e.g., Python could use
Literal["auto", "required", "none"] | ToolChoiceForce; TypeScript could use a string union
with the record form discriminated by type). Implementations MUST validate the shape at
call time before sending.
Returns: a Response (§6).
When response_schema is set and the model returns content (not tool calls):
Response.parsedis the parsed-and-validated structured value perresponse_schema.Response.message.contentis the JSON-serialized string form of the structured output (preserved verbatim from the provider per §6).
When response_schema is set and finish_reason is "tool_calls", Response.parsed MUST be
absent regardless of whether message.content is also populated (the §3 contract allows assistant
messages to carry both tool_calls and non-empty content, and this section does not change that).
message.content preserves the model's output verbatim per §6; the parsed slot only populates
when the model returned structured content (typically finish_reason: "stop").
When tools and response_schema are both supplied, the model decides which path to take,
signaled by finish_reason. If finish_reason is "tool_calls", the user handles tool execution
and may make a follow-on complete(); if finish_reason is "stop", the user reads parsed
and/or message.content.
When response_schema is None / absent, Response.parsed is absent regardless of content. The
v0.4.0 behavior is preserved exactly.
Operation semantics:
complete()MUST NOT mutatemessages,tools,config,response_schema, ortool_choice.complete()MUST be reentrant: multiple concurrent calls on the same provider are permitted. Implementations MUST NOT serialize concurrent calls internally.complete()does NOT loop on tool calls. If the response'sfinish_reasonis"tool_calls", the caller is responsible for executing the tools, appendingtoolmessages, and making a follow-oncomplete().complete()does NOT retry on transient errors. Errors propagate; retry policy belongs above this layer.- When
response_schemais set and the model produces output that successfully parses as JSON but fails to validate againstresponse_schema, OR fails to parse as JSON at all,complete()raisesstructured_output_invalid(§7). complete()MUST validatetool_choiceagainsttoolsbefore sending. The validation rules:tool_choice="required"requirestoolsnon-empty.tool_choice={type: "tool", name: X}requirestoolsnon-empty AND X to be aTool.namein the supplied list.tool_choice="auto"andtool_choice="none"have notools-related preconditions.
Violations of rules 1–2 raise provider_invalid_request (§7) at pre-send validation, before
the implementation contacts the provider.
When tool_choice="none" is supplied AND the provider returns tool calls anyway, the
implementation MUST surface what the provider returned (per the §6 transparency principle)
without re-validating against the constraint post-hoc. The constraint is a request-side hint
the implementation passes to the wire; whether the model honored it is observable via the
returned finish_reason ("tool_calls" means the model called tools regardless of the
"none" hint) but is not enforced by the framework. Providers vary in whether they honor
"none" strictly; provider compliance is a provider-quality concern, not a framework-policed
contract.
6. Response and configuration¶
A Response record:
| Field | Description |
|---|---|
message |
The assistant message returned by the model. Always role: "assistant". May carry tool_calls. |
finish_reason |
One of "stop", "length", "tool_calls", "content_filter", "error". See below. |
usage |
A record {prompt_tokens, completion_tokens, total_tokens}. Each field is a non-negative integer or null. If the provider does not report usage, all three MUST be null. |
raw |
The parsed provider response, as a language-idiomatic representation of deserialized JSON (Python: dict[str, Any]; TypeScript: Record<string, unknown>). MUST be populated on every successful return. Carries everything the provider returned — including fields the spec does not normalize (logprobs, content-filter details, provider-specific extensions). The normalized fields above are derived from raw; the two views MUST be consistent (modifying one does not affect the other, since both are immutable from the caller's perspective). |
parsed |
The parsed and validated structured value when the call supplied a response_schema and the model returned structured content. The value conforms to the supplied response_schema. Absent (null / None / undefined, per the language's idiom) on calls that did not supply a response_schema, and on responses whose finish_reason is "tool_calls" (regardless of whether message.content is also populated, per the §3 assistant-message contract). |
finish_reason semantics:
stop— the model produced a complete response and stopped naturally.length— the model hitmax_tokens(or the equivalent provider budget).tool_calls— the model returned tool calls and is awaiting their results.content_filter— the provider's content filter blocked or truncated the response.error— the provider reported an internal error mid-stream and could not return a complete response. This is distinct from acomplete()exception (which signals a request-level failure per §7);finish_reason: "error"signals a degraded but parseable response. The response MAY carrytool_calls, possibly with malformedarguments; see §3 "Validation underfinish_reason: \"error\"" for handling.
parsed semantics. The parsed field is the language-idiomatic deserialized form of the
structured value (e.g., a Python dict[str, Any] populated per the JSON Schema, or a TypeScript
unknown typed at the call site via a generic). Implementations MAY offer ergonomic typed
accessors on top (e.g., Python users supplying a Pydantic model class instead of a raw JSON
Schema and receiving a validated model instance, surfaced via per-language overloads or generics
so that the static type of parsed reflects the supplied schema) — those are per-language
ergonomics, not normative spec.
message.content carries the provider's content string preserved verbatim — the bytes the model
returned, UTF-8 decoded. Implementations MUST NOT re-serialize parsed back into
message.content; doing so would mask formatting differences (whitespace, key ordering, number
representation) and break conformance assertions that rely on byte-level equivalence. parsed
and message.content MUST be consistent in the following sense: deserializing message.content
as JSON and validating against response_schema produces parsed. The reverse operation
(serializing parsed and comparing) is NOT required to round-trip bytewise, because the model's
serialization may differ from the framework's.
When finish_reason: "tool_calls", parsed is absent regardless of whether response_schema
was supplied. The tool-call path and the structured-content path are mutually exclusive at the
response level.
A RuntimeConfig record:
| Field | Description |
|---|---|
temperature |
Float, optional. Provider-specific range; commonly [0.0, 2.0]. |
max_tokens |
Int, optional. Maximum completion tokens. |
top_p |
Float, optional. Nucleus sampling probability. |
seed |
Int, optional. Best-effort determinism for providers that support it. Setting seed does NOT guarantee determinism; see §9. |
Implementations MAY accept additional provider-specific fields. The four above are the minimum.
7. Error semantics¶
A provider call (ready() or complete()) may raise one of the following canonical category errors:
provider_authentication— auth failed (invalid key, expired token, missing credentials).provider_unavailable— provider is unreachable (network failure, 5xx error, connection timeout, DNS failure).provider_invalid_model— the bound model does not exist on this provider (unknown to the provider's model catalog). Terminal: retry will not succeed without changing the bound model.provider_model_not_loaded— the bound model is known to the provider but is not currently serving requests (e.g., a local vLLM, LM Studio, or llama.cpp server has the model configured but has not yet loaded it into memory, or has unloaded it under memory pressure). Distinct fromprovider_invalid_modelbecause retry MAY succeed once loading completes; warmup-polling callers SHOULD treat this as a transient signal.provider_rate_limit— provider returned a rate-limit response (e.g., HTTP 429). Implementations SHOULD expose aretry_afteraccessor when the provider supplies one (e.g.,Retry-Afterheader).provider_invalid_response— provider returned a malformed response that cannot be parsed into the §6 shape (missing required fields, invalidtool_callsstructure, invalid JSON).provider_invalid_request— the request was malformed before sending (per-role message constraints violated,tool_call_iddoes not match an earlierassistanttool call, duplicate tool names, etc.). This category is raised by the implementation's pre-send validation. Thetool_choiceparameter (§5) adds three validation failure modes routed through this category: (1)tool_choice="required"supplied with empty / absenttools; (2)tool_choice={type: "tool", name: X}supplied with empty / absenttools; (3)tool_choice={type: "tool", name: X}supplied with X not in the suppliedtoolslist. Each MUST raiseprovider_invalid_requestat pre-send validation, before the implementation contacts the provider.provider_unsupported_content_block— the bound model does not support a content block type used in the request (e.g., a text-only model received an image block, or the model supports images but not the requestedmedia_type(per §3.1.2) orsourcevariant (per §3.1.3)). Raised by the implementation's pre-send validation when the unsupported case is statically known (per the provider's documented capabilities), or by the post-receive error mapping when the provider itself rejects the request.structured_output_invalid—complete()was called with aresponse_schema(§5), and the provider returned content that could not be parsed as JSON OR did not validate against the supplied schema. The error MUST expose the requestedresponse_schema, the raw response content (the bytes the model produced), and a description of the validation or parse failure (the wrapped exception's message, the failing JSON Pointer, or the language's idiomatic equivalent). Non-transient by default — a model that fails to produce schema-compliant output on a given prompt usually fails the same way on retry. Users wanting retry-on-validation-failure semantics MAY includestructured_output_invalidin a pipeline-utilitiesRetryMiddlewareclassifier's transient set, but the category is NOT transient by default at the spec level. Distinct fromprovider_invalid_response(which covers wire-shape malformation, not content validation against the caller's schema).
Each error MUST expose a category identifier (matching the strings above, as an error class, error
code, or tagged discriminant per the language's idiom). Provider-originated errors SHOULD preserve
the underlying provider exception as cause (__cause__ in Python, cause in TypeScript).
These nine categories are the minimum required surface. Implementations MAY raise additional provider-specific categories for cases not covered above; users MAY catch by category to implement retry policy.
Retry classification. The categories provider_unavailable, provider_rate_limit,
provider_model_not_loaded, and finish_reason: "error" are transient — a retry MAY succeed.
The categories provider_authentication, provider_invalid_model, provider_invalid_request,
provider_invalid_response, provider_unsupported_content_block, and structured_output_invalid
are non-transient — retrying without changing the request will not succeed.
8. Wire-format mappings¶
The §5 Provider interface, §3 message shape, §4 Tool definition, §6 Response and configuration, and §7 error semantics are the normative cross-provider contract. Any provider implementation conforming to those sections satisfies the abstract spec, regardless of the underlying HTTP / RPC / SDK wire format used to reach the model.
This section catalogs concrete wire-format mappings for specific provider protocols. Each mapping specifies how the abstract §3 / §4 / §6 records translate to that provider's wire shape and how the provider's responses / errors map back to §3 / §6 / §7. §8.1 describes the OpenAI-compatible Chat Completions mapping, which is the broadest-compatibility option (the OpenAI hosted API, vLLM, LM Studio, llama.cpp server, and many other local servers all speak it). Future subsections (§8.2, §8.3, …) are reserved for provider-native formats whose shape diverges from the OpenAI mapping — Anthropic Messages API, Google Gemini, Mistral, etc. Each lands via its own follow-on proposal.
Default placement rule. Any provider wire-format mapping intended for implementation across
multiple OA language implementations (Python, TypeScript, …) MUST be specified in this section.
The cross-language behavioral consistency that §3 / §5 / §7 provide for the abstract Provider
interface extends to wire-format mappings whenever the same provider is targeted from multiple
languages — without a shared spec, sibling packages like openarmature-anthropic (Python) and
openarmature-anthropic (TypeScript) would diverge in subtle wire shape and break the
cross-language promise.
Out-of-tree mappings. Wire-format mappings NOT specified here remain valid but make NO cross-impl behavioral guarantee. Out-of-tree is appropriate for: (a) genuinely single-language specialty providers (a vendor-specific mapping with no anticipated TypeScript sibling), (b) vendor extensions that explicitly opt out of cross-impl consistency, or (c) experimental mappings still finding their shape before promotion to in-spec status. In all other cases the in-spec default applies.
Compliance label. Provider implementations MAY opt into a mapping's compliance label (e.g., "OpenAI-compatible", "Anthropic Messages") only if they implement that mapping exactly per the §8.X subsection. A provider MAY implement multiple mappings (e.g., one implementation routing OpenAI-compatible requests through one path and Anthropic-native requests through another) and claim the corresponding labels independently.
Per-mapping subsection structure. Each §8.X subsection SHOULD follow the canonical structure used by §8.1:
| Subsection | Topic |
|---|---|
| §8.X.1 | Request mapping |
| §8.X.2 | Response mapping |
| §8.X.3 | Error mapping |
| §8.X.4 | Concurrency |
| §8.X.5 | Structured output |
Provider-specific sub-subsections (e.g., §8.X.1.1 for content-block wire mapping per §8.1.1.1, §8.X.5.1 for prompt-augmentation fallback per §8.1.5.1) are permitted and expected. Providers whose wire shapes have features without §8.1 analogues MAY add additional top-level subsections at the end of the recommended five (e.g., §8.X.6 Caching if the provider exposes a caching primitive worth spec'ing); the recommended five SHOULD precede any provider-specific additions, in the order shown.
The recommendation is SHOULD-level rather than MUST-level because some providers' shapes diverge from §8.1's organization in ways the template cannot accommodate by sub-subsection alone. When a §8.X proposal diverges from this template, the proposal text SHOULD explain the divergence in its Detailed design section so reviewers can confirm the divergence is structural rather than ergonomic.
8.1 OpenAI-compatible mapping¶
The OpenAI Chat Completions API (POST /v1/chat/completions) is the de facto standard for local
LLM servers (vLLM, LM Studio, llama.cpp) as well as the OpenAI hosted API itself.
8.1.1 Request mapping¶
The §3 message list maps onto the OpenAI messages field:
| Spec role | OpenAI role | Notes |
|---|---|---|
system |
system |
Direct mapping. |
user |
user |
When content is a string, maps directly. When content is a content-block sequence (§3.1), maps to OpenAI's content-array form per §8.1.1.1. |
assistant (no tool calls) |
assistant |
content becomes OpenAI's content. |
assistant (with tool calls) |
assistant |
content becomes OpenAI's content (may be null per OpenAI's schema if empty). tool_calls becomes OpenAI's tool_calls array. |
tool |
tool |
content becomes OpenAI's content. tool_call_id becomes OpenAI's tool_call_id. |
A spec ToolCall {id, name, arguments} maps to an OpenAI tool call entry as:
{
"id": <id>,
"type": "function",
"function": {
"name": <name>,
"arguments": <JSON-serialized arguments>
}
}
The spec stores arguments as a deserialized mapping; the wire format requires a JSON-encoded
string. Implementations MUST serialize on send and deserialize on receive.
A §4 Tool {name, description, parameters} maps to an OpenAI tools entry as:
{
"type": "function",
"function": {
"name": <name>,
"description": <description>,
"parameters": <parameters>
}
}
The §6 RuntimeConfig fields map directly: temperature, max_tokens, top_p, seed. The bound
model identifier becomes OpenAI's model field.
The §5 tool_choice parameter maps to OpenAI's tool_choice request-body field:
Spec tool_choice |
OpenAI wire body |
|---|---|
None / absent |
(field omitted from request body) |
"auto" |
tool_choice: "auto" |
"required" |
tool_choice: "required" |
"none" |
tool_choice: "none" |
{type: "tool", name: X} |
tool_choice: {type: "function", function: {name: X}} |
The None-omitted-from-wire row is load-bearing for backward compatibility: existing callers
who never supply tool_choice see no wire-shape change, and the OpenAI provider's own default
(which itself depends on whether tools is non-empty) applies unchanged. The spec type: "tool"
discriminator renames OpenAI's type: "function" for spec-layer readability; the implementation
performs the rename when constructing the wire body.
8.1.1.1 Content-block wire mapping¶
Each spec content block maps to one OpenAI content-array entry:
| Spec block | OpenAI entry |
|---|---|
TextBlock { text } |
{ "type": "text", "text": <text> } |
ImageBlock with source: url { url } |
{ "type": "image_url", "image_url": { "url": <url> } }. The detail hint, when set on the spec block, becomes image_url.detail. |
ImageBlock { media_type, source: inline { base64_data } } |
{ "type": "image_url", "image_url": { "url": "data:<media_type>;base64,<base64_data>" } }. OpenAI's inline-image path goes through the same image_url entry shape with a data: URL; implementations MUST construct the data URI per RFC 2397, reading media_type from the ImageBlock and base64_data from its inline source. The detail hint, when set, becomes image_url.detail. |
Empty content blocks (e.g., a text block with empty text, or an image block with both
sources absent) are spec-invalid and MUST be rejected at pre-send validation per §3 /
provider_invalid_request. The wire never sees them.
OpenAI uses the same image_url content-entry shape for both URL-referenced and base64-inline
images (with the inline case expressed as a data: URL). Anthropic and Google use different
wire shapes; their own §8-style mapping sections (added by future proposals per §10's
"Provider-native wire formats" deferral) will define their own block→wire mappings without
disrupting this one.
8.1.2 Response mapping¶
A successful OpenAI response maps onto a §6 Response as follows:
message— built fromchoices[0].message(assuming a single-choice request, which is the only shape v1 supports).finish_reason—choices[0].finish_reason. OpenAI's values arestop,length,tool_calls,content_filter, andfunction_call(legacy). Mapfunction_calltotool_calls. Map any unknownfinish_reasontoerror.usage— built from the response'susagefield. Ifusageis absent, all three usage subfields MUST benull.raw— the parsed JSON body of the OpenAI response, verbatim. Implementations MUST NOT redact, rewrite, or omit fields. Provider-specific extensions surface here unchanged (e.g.,choices[0].logprobs, vLLM'sprompt_logprobs, LM Studio's runtime stats).
8.1.3 Error mapping¶
| OpenAI condition | Spec category |
|---|---|
| HTTP 401, 403 | provider_authentication |
| HTTP 404 with model-not-found body | provider_invalid_model |
| HTTP 503 with model-loading body | provider_model_not_loaded |
| HTTP 429 | provider_rate_limit |
| HTTP 5xx (other), connection error, timeout | provider_unavailable |
HTTP 400 with body indicating the bound model rejected a content block (e.g., image/audio/media-type rejection, unsupported source variant) |
provider_unsupported_content_block |
| HTTP 400 (malformed request, schema violation) | provider_invalid_request |
| Successful HTTP response that fails to parse into §6 shape | provider_invalid_response |
8.1.4 Concurrency¶
OpenAI-compatible servers vary in concurrency support — local servers may serialize internally,
hosted APIs do not. Implementations MUST NOT add a serialization layer; concurrent complete() calls
go to the wire concurrently. Providers that benefit from client-side concurrency limits use the
pipeline-utilities rate limiter or middleware, not this layer.
8.1.5 Structured output¶
When complete() is called with a response_schema, the OpenAI-compatible request body includes
a response_format field:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "<implementation-derived identifier>",
"schema": <response_schema verbatim>,
"strict": true
}
}
}
The name field is required by OpenAI but does not affect output semantics; implementations
SHOULD derive a stable identifier from the schema (e.g., a hash, or the schema's title field
when present). The strict: true flag enables OpenAI's schema-constrained decoding path;
implementations SHOULD pass strict: true when the supplied schema satisfies the strict-mode
constraints (no additionalProperties: true, all properties listed in required, etc.), and
SHOULD fall back to strict: false when the schema does not satisfy the constraints. The
behavioral contract at the spec layer is identical regardless of strict: validation happens
post-receive against response_schema; failures raise structured_output_invalid (§7).
When complete() is called without response_schema (or with response_schema=None), the
request body MUST NOT include response_format. The v0.4.0 wire shape is preserved unchanged
for free-form calls.
8.1.5.1 Fallback for providers without native structured output¶
OpenAI-compatible servers that do not implement response_format (older vLLM versions, some
LM Studio releases, some local-server wrappers) raise an error or silently ignore the field.
Implementations SHOULD detect this — either statically (via provider capability metadata) or
dynamically (a first-call attempt that returns an error) — and fall back to a prompt-augmentation
strategy:
- Construct a modified copy of the message list with a system directive appended (or with the
existing system message's content extended) instructing the model to return only valid JSON
matching the
response_schema. The directive SHOULD include the schema serialized as part of the prompt. The caller's originalmessageslist MUST be left unchanged — the §5 mutation rule applies to fallback paths the same as native paths. - Issue the underlying request without
response_format. - Parse and validate the response content against
response_schemaper §6parsed. - On validation failure, raise
structured_output_invalidper §7.
Fallback behavior is implementation-defined. Implementations MUST document whether complete()
with response_schema uses native response_format or prompt-augmentation, and SHOULD expose
a way for callers to inspect or override the path chosen.
8.1.5.2 Response mapping¶
When the response carries structured content (not tool calls):
message.contentis the response body's content string, verbatim.parsedis the deserialization ofmessage.contentagainstresponse_schema.finish_reasonis mapped per §8.1.2 (typically"stop").
When the response carries tool calls instead, the mapping follows §8.1.2 unchanged: parsed is
absent, tool_calls is populated, finish_reason is "tool_calls".
9. Determinism¶
LLM completions are not deterministic by default. Even with temperature=0 and a fixed seed,
identical inputs MAY produce different outputs across calls or across deployments of the same
provider (different model weight versions, different infrastructure, different sampling
implementations).
The spec therefore makes no determinism guarantees for complete(). The conformance suite uses
mock providers that return canned responses; live-provider tests are out of scope.
For ready(): implementations MUST return successfully when the provider is reachable and the
model exists, and raise the appropriate §7 category otherwise. This is testable deterministically
against a mock or stub HTTP server.
10. Out of scope¶
Not covered by this specification; deferred to follow-on capabilities or proposals:
- Streaming responses — incremental delivery of assistant content and tool calls.
- Multi-modal audio and video — audio and video inputs and outputs. Image inputs are covered by §3.1 (per proposal 0015). Audio and video each warrant their own proposal — formats, codecs, inline-vs-URL semantics, and provider wire mappings differ enough that one proposal per modality is the right scope.
- Image outputs — assistant-message-borne images (e.g., DALL-E-style image generation). v1 image support is user-input-only; assistant-output image content would need a separate proposal and is not common in tool-using agent workloads.
- Token counting before the call — tokenizer access for budget-aware prompt assembly.
- Provider-native wire formats — Anthropic Messages, Google Vertex, AWS Bedrock. Each adds a new §8-style mapping section to this spec via a follow-on proposal.
- Agent loop — tool-call-then-respond loops live in graph-engine nodes or a future agent-runner capability.
- Retry and rate-limit policy — pipeline-utilities concern.
- Prompt template rendering — prompt-management capability (charter §4.5).
- Embeddings — separate API surface; separate capability if/when needed.