0016: LLM Provider — Structured Output¶
- Status: Accepted
- Author: Chris Colinsky
- Created: 2026-05-13
- Accepted: 2026-05-14
- Targets: spec/llm-provider/spec.md (modifies §5 Provider interface; modifies §6 Response and configuration; modifies §7 Error semantics; adds §8.5 Structured output wire mapping; modifies §10 Out of scope)
- Related: 0006 (LLM provider core)
- Supersedes:
Summary¶
Extend the existing complete() operation with an optional response_schema parameter that
constrains the model's output to a caller-supplied JSON Schema. Augment Response with a
parsed field that carries the validated structured value when response_schema was
supplied. Introduce a new error category structured_output_invalid (non-transient by default)
for the case where the model's output cannot be parsed or validated against the requested
schema. Specify the OpenAI wire mapping for structured output via response_format, with
prompt-augmentation-plus-post-parse as a fallback for providers that do not natively
support it.
Motivation¶
§10 of v0.4.0 defers structured output: "JSON mode, schema-constrained decoding, response_format." That deferral was correct at v1 scope — the v1 surface was tool calling, canonical errors, and the OpenAI wire mapping. But every meaningfully-typed LLM pipeline ends up needing structured output, and the absence of a framework primitive forces every node to re-implement the same render/call/parse/validate/retry loop in user code.
The boilerplate today, replicated across every LLM-calling node that produces typed output:
- Render the prompt with whatever JSON-shape directive the user has standardized on.
- Call
complete(). - Parse the response content as JSON.
- Validate the parsed JSON against the user's schema declaration.
- On parse or validation failure, retry with adjusted parameters (commonly
max_tokensis doubled, sometimes the system message is augmented with a stricter directive). - On second failure, give up or split the input.
Steps 3–6 are identical across nodes; only the schema and prompt differ. Folding this into
the Provider interface lets users declare the expected shape once and rely on the framework
to either (a) use the provider's native structured-output path (OpenAI's response_format,
Anthropic's tool-as-schema convention, Gemini's responseSchema) where the model produces
schema-compliant output on the wire in one trip, or (b) fall back transparently to the
prompt-augmentation-plus-post-parse shape for providers that lack native support.
The Provider-level placement matters. A middleware-level structured-output wrapper can only
operate on the response after complete() has returned a string — it cannot reach the wire
to use native structured-output features, so it pays the round-trip cost of an unstructured
call and a post-parse retry even when the model could have produced schema-compliant output
on the first try. Provider-level placement opens the native path; userland middleware
patterns remain buildable on top for users who want them.
The single-method shape (parameter on complete() rather than a separate
complete_structured() method) matches the industry pattern across major LLM SDKs (OpenAI,
Anthropic, Gemini, Instructor) and keeps the Provider Protocol surface small. The shape
discussion is recorded in Alternatives considered.
Detailed design¶
§5 Provider interface: extend complete() with response_schema¶
Amend the existing complete() operation in §5 to accept an optional response_schema
parameter. The full updated signature (described abstractly; per-language ergonomics decide
positional-vs-keyword conventions):
complete(messages, tools=None, config=None, response_schema=None)¶
Async. Performs a single completion call. When response_schema is supplied, the call
additionally constrains the model's output to conform to the schema.
messages— non-empty ordered sequence of messages. Constraints unchanged: the first message MAY besystem; the last message MUST beuserortool. Violations raiseprovider_invalid_request(§7).tools— optional ordered sequence ofToolrecords. Constraints unchanged.config— optionalRuntimeConfig(§6). Constraints unchanged.response_schema— optional JSON Schema describing the expected output shape. WhenNone/ absent, the call behaves as in v0.4.0: free-form text content; no parsed value. When present, MUST be a valid JSON Schema. The top-level schema MUST be an object schema (type: "object"at the root) — this matches §4Tool.parametersand OpenAI's strict-mode wire format. Non-object top-level schemas are out of scope for this proposal; a follow-on MAY relax this if cross-provider demand warrants. Implementations SHOULD validate at call time. The JSON Schema convention matches §4 — see §4's note on language-native schema constructors compiling to JSON Schema.
Returns: a Response (§6).
When response_schema is set and the model returns content (not tool calls):
Response.parsedis the parsed-and-validated structured value perresponse_schema.Response.message.contentis the JSON-serialized string form of the structured output.
When response_schema is set and finish_reason is "tool_calls", Response.parsed
MUST be absent regardless of whether message.content is also populated (the existing §3
contract allows assistant messages to carry both tool_calls and non-empty content, and
this proposal does not change that). message.content preserves the model's output
verbatim per §6; the parsed slot only populates when the model returned structured content
(typically finish_reason: "stop").
When tools and response_schema are both supplied, the model decides which path to
take, signaled by finish_reason. If finish_reason is "tool_calls", the user handles
tool execution and may make a follow-on complete() (per §5); if finish_reason is
"stop", the user reads parsed and/or message.content.
When response_schema is None / absent, Response.parsed is absent regardless of
content. The v0.4.0 behavior is preserved exactly.
Operation semantics (unchanged from v0.4.0 except where structured output augments them):
complete()MUST NOT mutatemessages,tools,config, orresponse_schema.complete()MUST be reentrant.complete()does NOT loop on tool calls.complete()does NOT retry on transient errors.- When
response_schemais set and the model produces output that successfully parses as JSON but fails to validate againstresponse_schema, OR fails to parse as JSON at all,complete()raisesstructured_output_invalid(§7).
§6 Response: add parsed field¶
Amend the §6 Response record by adding one field:
| Field | Description |
|---|---|
parsed |
The parsed and validated structured value when the call supplied a response_schema and the model returned structured content. The value conforms to the supplied response_schema. Absent (null / None / undefined, per the language's idiom) on calls that did not supply a response_schema, and on responses whose finish_reason is "tool_calls" (regardless of whether message.content is also populated, per the §3 assistant-message contract). |
The parsed field is the language-idiomatic deserialized form of the structured value
(e.g., a Python dict[str, Any] populated per the JSON Schema, or a TypeScript unknown
typed at the call site via a generic). Implementations MAY offer ergonomic typed accessors
on top (e.g., Python users supplying a Pydantic model class instead of a raw JSON Schema and
receiving a validated model instance, surfaced via per-language overloads or generics so
that the static type of parsed reflects the supplied schema) — those are per-language
ergonomics, not normative spec.
message.content carries the provider's content string preserved verbatim — the bytes the
model returned, UTF-8 decoded. Implementations MUST NOT re-serialize parsed back into
message.content; doing so would mask formatting differences (whitespace, key ordering,
number representation) and break conformance assertions that rely on byte-level
equivalence. parsed and message.content MUST be consistent in the following sense:
deserializing message.content as JSON and validating against response_schema produces
parsed. The reverse operation (serializing parsed and comparing) is NOT required to
round-trip bytewise, because the model's serialization may differ from the framework's.
When finish_reason: "tool_calls", parsed is absent regardless of whether
response_schema was supplied. The tool-call path and the structured-content path are
mutually exclusive at the response level.
§7 Error semantics: add structured_output_invalid¶
Add to the §7 canonical error list:
structured_output_invalid—complete()was called with aresponse_schema, and the provider returned content that could not be parsed as JSON OR did not validate against the supplied schema. The error MUST expose:- the
response_schemathat was requested, - the raw response content (the bytes the model produced),
- a description of the validation or parse failure (the wrapped exception's message, the failing JSON Pointer, or the language's idiomatic equivalent).
Non-transient by default. A model that fails to produce schema-compliant output on a
given prompt usually fails the same way on retry; retrying without changing the prompt,
model, or schema is unlikely to succeed. Users who want retry-on-validation-failure
semantics MAY include structured_output_invalid in a pipeline-utilities
RetryMiddleware classifier's transient set, but the category is NOT transient by
default at the spec level.
Update the retry classification paragraph at the end of §7 to add
structured_output_invalid to the non-transient list alongside provider_authentication,
provider_invalid_model, provider_invalid_request, and provider_invalid_response.
The category is distinct from provider_invalid_response. provider_invalid_response
covers "the provider's wire response is malformed per §6 shape" (the wire envelope is
broken). structured_output_invalid covers "the wire envelope is fine; the content the
provider returned does not validate against the caller's response_schema." Both are
non-transient, but they have different remediation paths (one is a provider bug; the other
is a model output that needs prompt or schema adjustment).
§8.5 Structured output wire mapping (new subsection)¶
Add a new subsection §8.5 to the OpenAI-compatible wire format chapter:
8.5 Structured output¶
When complete() is called with a response_schema, the OpenAI-compatible request body
includes a response_format field:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "<implementation-derived identifier>",
"schema": <response_schema verbatim>,
"strict": true
}
}
}
The name field is required by OpenAI but does not affect output semantics; implementations
SHOULD derive a stable identifier from the schema (e.g., a hash, or the schema's title
field when present). The strict: true flag enables OpenAI's schema-constrained decoding
path; implementations SHOULD pass strict: true when the supplied schema satisfies the
strict-mode constraints (no additionalProperties: true, all properties listed in
required, etc.), and SHOULD fall back to strict: false when the schema does not satisfy
the constraints. The behavioral contract at the spec layer is identical regardless of
strict: validation happens post-receive against response_schema; failures raise
structured_output_invalid.
When complete() is called without response_schema (or with response_schema=None), the
request body MUST NOT include response_format. The v0.4.0 wire shape is preserved
unchanged for free-form calls.
8.5.1 Fallback for providers without native structured output¶
OpenAI-compatible servers that do not implement response_format (older vLLM versions, some
LM Studio releases, some local-server wrappers) raise an error or silently ignore the field.
Implementations SHOULD detect this — either statically (via provider capability metadata) or
dynamically (a first-call attempt that returns an error) — and fall back to a
prompt-augmentation strategy:
- Construct a modified copy of the message list with a system directive appended (or
with the existing system message's content extended) instructing the model to return
only valid JSON matching the
response_schema. The directive SHOULD include the schema serialized as part of the prompt. The caller's originalmessageslist MUST be left unchanged — the §5 mutation rule applies to fallback paths the same as native paths. - Issue the underlying request without
response_format. - Parse and validate the response content against
response_schemaper §6parsed. - On validation failure, raise
structured_output_invalidper §7.
Fallback behavior is implementation-defined. Implementations MUST document whether
complete() with response_schema uses native response_format or prompt-augmentation,
and SHOULD expose a way for callers to inspect or override the path chosen.
8.5.2 Response mapping¶
When the response carries structured content (not tool calls):
message.contentis the response body's content string, verbatim.parsedis the deserialization ofmessage.contentagainstresponse_schema.finish_reasonis mapped per §8.2 (typically"stop").
When the response carries tool calls instead, the mapping follows §8.2 unchanged: parsed
is absent, tool_calls is populated, finish_reason is "tool_calls".
§10 Out of scope: structured output removed¶
Remove the existing §10 entry:
- Structured output — JSON mode, schema-constrained decoding, response_format.
This proposal's §5 / §6 / §7 / §8.5 amendments collectively cover the deferred capability. The remaining §10 entries (streaming, audio/video, token counting, provider-native wire formats, agent loop, retry/rate-limit, prompt template rendering, embeddings) are unchanged.
Cross-spec touchpoints¶
This proposal does not modify graph-engine, pipeline-utilities, or observability.
Observability §5.5 (llm.model, llm.finish_reason, llm.usage.*) is unchanged. A
follow-on observability proposal MAY surface structured_output_invalid failures with
additional attributes (the failing JSON Pointer, the schema name, etc.), but the existing
span attributes already capture the call's basic identity.
A userland StructuredOutputMiddleware (an earlier alternative — see Alternatives
considered) remains buildable on top of the existing pipeline-utilities §6 middleware seam
for callers who want it. The single-method extension does NOT preclude that pattern; it
just makes it unnecessary in the common case.
Conformance test impact¶
Add fixtures under spec/llm-provider/conformance/. Each fixture is a pair
(NNN-name.yaml + NNN-name.md) per the conformance README:
018-structured-output-success.yaml—complete()with a simple object schema passed asresponse_schema; provider returns valid JSON matching the schema. Assert:- returned
Response.parsedis the parsed value (deep-equal to the expected deserialization); Response.message.contentis the JSON-serialized form;Response.parsedandResponse.message.contentare consistent underresponse_schema;finish_reasonis"stop".019-structured-output-parse-failure.yaml— provider returns content that is not valid JSON (truncated, syntactically invalid). Assertcomplete()raisesstructured_output_invalid; the error exposes the schema, raw content, and a parse-failure description.020-structured-output-validation-failure.yaml— provider returns valid JSON that fails to validate againstresponse_schema(missing required field, type mismatch). Assertcomplete()raisesstructured_output_invalid; the error description identifies the failing field.021-structured-output-non-transient.yaml— verify retry classification: an unwrappedRetryMiddleware(per pipeline-utilities §6.1) with the default classifier does NOT retrystructured_output_invalid(the category is non-transient by default).022-structured-output-with-tool-calls.yaml—complete()called with bothtoolsandresponse_schemapopulated; provider returnstool_callsinstead of structured content. AssertResponse.parsedis absent,Response.message.tool_callsis populated,finish_reasonis"tool_calls".023-structured-output-openai-wire-mapping-native.yaml— assert the outbound HTTP request to an OpenAI-compatible endpoint carries theresponse_format: { type: "json_schema", json_schema: { name, schema, strict } }shape with the spec'sresponse_schemapassed verbatim underjson_schema.schema.024-structured-output-openai-wire-mapping-fallback.yaml— provider configured to reject nativeresponse_format; assert the implementation falls back to prompt-augmentation + post-parse, the response is still validated againstresponse_schema, and the finalResponse.parsedis populated identically to the native path.025-structured-output-no-schema-regression.yaml— regression: acomplete()call withoutresponse_schemaproduces aResponsewithparsedabsent, even when the model would have produced JSON-looking content. The wire shape MUST NOT includeresponse_format. The v0.4.0 behavior is preserved exactly whenresponse_schemais not supplied.
Alternatives considered¶
Option A — Middleware-level (StructuredOutputMiddleware)¶
An earlier alternative recommended a canonical middleware that ships alongside
RetryMiddleware and TimingMiddleware. The middleware would read the inner node's raw
response from a designated state field, attempt JSON parse + validation against a declared
schema, and either pass the validated value through or raise a transient
structured_output_invalid so RetryMiddleware retries automatically.
Rejected because:
- Middleware can't reach the wire. Native structured-output support (OpenAI strict
mode, Anthropic tool-as-schema, Gemini responseSchema) lives at the provider boundary.
A middleware operating on the post-
complete()partial update has already paid the round-trip cost of an unstructured call; it can't engage the native one-trip path. For the major providers in 2026, this is a meaningful cost. - State-field coupling. The middleware would need to know which state field carries
the raw response (
llm_response_raw,claim_draft_raw, etc.) and which receives the parsed value. That's per-node naming convention being lifted into a canonical middleware's configuration, which couples the middleware to per-graph state-schema choices. - Pipeline-utilities cross-cutting concerns. RetryMiddleware and TimingMiddleware are genuinely cross-cutting (anything can fail; anything has duration). Structured output is a per-call concern about the provider's wire path, not a cross-cutting concern about node execution.
Userland middleware around complete() remains buildable for users who want
middleware-style composition. This proposal does not preclude that pattern — it just
doesn't make it the canonical shape.
Option B alternative — Separate complete_structured() method¶
An earlier draft of this proposal split structured output into a distinct
complete_structured() method on the Provider Protocol, parallel to complete() and
ready(). The argument was that the return shape differs (presence of parsed) enough to
justify a distinct surface.
Rejected because:
- Industry pattern is single-method. Every major LLM SDK in 2026 (OpenAI, Anthropic,
Gemini, Instructor, LiteLLM) exposes structured output as a parameter on the existing
completion method (
response_format,response_schema,response_model), not as a separate method. A two-method Provider interface would be unusual and would surprise anyone porting from another SDK. - Wrapper composition. Wrappers around
Provider(logging, mocking, observability, caching) write one wrapper, not two. Splitting structured output into a second method doubles the wrapper surface for no payoff. - Implementation duplication. Most of the implementation is shared between "structured and free-form completions"; the two-method form would expose that as two thin variants over a shared core.
- Surface growth. If future modes appear (audio output, multi-response, etc.), the single-method form scales by adding parameters; the two-method form would push toward N methods.
The weak counter-argument for two methods is static typing: a separate
complete_structured() returning StructuredResponse[T] could be statically distinct
from complete() returning Response. Single-method recovers most of this via
per-language overloads (Python @overload, TypeScript conditional types on the schema
parameter) without splitting the Protocol. The spec doesn't mandate the per-language
typing approach; it mandates the behavioral contract.
Bundle response_schema semantics as required (no opt-out)¶
Considered briefly: make response_schema a required parameter of complete() and
always validate output. Rejected — free-form text is the dominant call shape (chat
assistants, narrative responses, summaries). Requiring a schema would force
{"type": "string"} boilerplate at every free-form call site and would push the model
into schema-fitting output style even for prose.
Schema-language flexibility (Pydantic / Zod / attrs as alternative inputs)¶
Considered: spec accepts a "schema declaration" abstractly and lets implementations decide
how to interpret it (Pydantic class, Zod schema, raw JSON Schema). Rejected because the
wire format is JSON Schema regardless — accepting other forms just pushes the
JSON-Schema-derivation step inside the implementation. The spec mandates JSON Schema
(matching §4 Tool.parameters); implementations are free to offer ergonomic constructors
that compile from native types to JSON Schema. This is the same shape §4 took for tools
and has held up well.
Transient-by-default error classification¶
An earlier draft of this proposal classified structured_output_invalid as transient by
default, so RetryMiddleware would retry without user configuration. Rejected because:
- A model that produces non-schema-compliant output on a given prompt usually produces the same non-compliant output on the next attempt (the model is sampling from approximately the same distribution given the same context). The retry burns tokens without improving outcomes.
- Genuinely useful retry-on-validation-failure logic typically involves modifying the
prompt (adding a more emphatic directive, including the validation error as feedback,
doubling
max_tokensto recover from truncation). That's not "retry the same call" — it's "retry a different call." Spec-level transient-by-default would make the no-op-retry case the default, which is the worst of both worlds. - Users who want naive retry can configure their RetryMiddleware classifier to include
structured_output_invalidin its transient set. The classifier-driven model already exists per pipeline-utilities §6.1; the new category just isn't transient by default.
Add a parsed accessor on Message instead of Response¶
Considered: put parsed on Message (alongside content, tool_calls,
tool_call_id) rather than on Response. Rejected because the parsed value is a
per-call artifact derived from a specific schema — it's not an intrinsic property of the
message that would persist if the message were later re-sent or stored. Keeping parsed
on Response ties it to the specific call boundary where the schema was supplied.
Open questions¶
None at time of submission.