Harness — Chat¶
Canonical behavioral specification for the OpenArmature chat harness sub-spec.
- Capability: harness-chat
- Introduced: spec version 0.50.0
- History:
- created by proposal 0056
This specification is language-agnostic. Each implementation (Python, TypeScript, …) maps its own idioms
onto the behavioral contract described here. Conformance is verified by the fixtures under conformance/.
Normative keywords (MUST, MUST NOT, SHOULD, MAY) are used per RFC 2119.
1. Purpose¶
The harness-chat capability defines the behavioral contract for chat-shaped harnesses — the
sub-spec layer on top of the abstract harness capability (spec/harness/spec.md) specifying
the canonical chat-loop deployment shape. Chat harnesses are the first per-harness-type sub-spec
landing per proposal 0022's Open questions resolution.
The cross-impl divergence risk for chat-shaped deployments is the highest among the harness types OpenArmature knows about — every production chat agent reinvents message shape, history convention, send-and-reply surface, suspension handling, and error mapping. Ratifying the contract once preserves the cross-impl behavioral promise.
The capability composes with:
- harness — the chat harness is a harness; everything in
spec/harness/spec.mdapplies (turn semantics, dispatch path classification, outcome handling, signal coordinator, error categorization). This sub-spec layers chat-shaped surfaces over that abstract contract. - sessions — chat is sessioned-mode only (per §1 Mode constraint below); conversation history
lives in session state per
spec/sessions/spec.md. - suspension — chat agents commonly pause for human-in-the-loop approvals; the chat harness
composes with
spec/suspension/spec.mdvia §8 below. - prompt-management — the canonical
ChatMessageshape reuses the content-block model fromspec/prompt-management/spec.md§3.1 (and llm-provider §3.1 unchanged); chat-prompt templates render into messages chat harnesses consume. - llm-provider — the chat message shape mirrors llm-provider §3 unchanged
(
spec/llm-provider/spec.md§3); chat-harness messages flow through to providercomplete()calls without translation. - observability — chat turns inherit harness §4.6 (turn-level wrapper span); no chat-specific span attributes are introduced.
Mode constraint. The chat sub-spec is sessioned-mode only. Chat without conversation
history is not a coherent shape — every turn would lose context. Stateless completion endpoints
("send one message, get one reply, no memory") are a different harness type. Implementations
constructing a chat harness MUST resolve session_id per harness §3.1 / §3.2 / §3.3 for every
inbound turn; harness §3.0 Stateless transmission path does NOT apply.
This capability does NOT define:
- Cross-session memory. Per-user profiles, episodic memory, semantic / vector-indexed retrieval. Out of scope per §13; deferred to a future memory capability.
- Specific chat-platform adapters. Slack, Discord, Microsoft Teams, WhatsApp, customer-platform integrations — sibling-package work that builds on this spec.
- Voice input / output. A voice harness is a different harness type (streaming-first, low-latency, with voice-activity-detection turn boundaries). Future sub-spec if real.
- Rich-media compose UI. Application concern.
- Conversation summarization / context-window management. Application concern; this spec exposes the raw history field.
2. Concepts¶
ChatMessage. The canonical typed message at the chat-harness API boundary. Mirrors llm-provider §3's message shape unchanged. See §3.
ContentBlock. Per llm-provider §3.1 — text / image / thinking / redacted_thinking.
Imported by reference; the chat sub-spec adds none.
ConversationHistory. The per-session field carrying the ordered message list. See §4.
ChatTurnOutcome. The discriminated return type of the chat harness's send() callable;
distinguishes completed reply, errored turn, and suspended turn. See §5.
The send() callable. The higher-level convenience surface above raw invoke(). Takes a
session id and an inbound user message; returns a ChatTurnOutcome. See §5.
Pending message. A synthetic assistant message a graph node MAY append to state before
calling suspend() to inform the user the turn is awaiting an external signal. See §8.
Subscribed listener. The default mechanism by which a chat harness surfaces the post-resume assistant reply to the caller after a suspend → signal → resume cycle. See §8.
3. Message shape (ChatMessage)¶
A ChatMessage mirrors the llm-provider §3 message shape unchanged. The fields:
| Field | Required | Description |
|---|---|---|
role |
yes | One of "system", "user", "assistant", "tool". Discriminator per llm-provider §3. |
content |
conditional | Per llm-provider §3's per-role constraints (a non-empty string OR a non-empty ordered sequence of content blocks per llm-provider §3.1 — text / image / thinking / redacted_thinking; content MAY be empty on assistant messages with non-empty tool_calls). |
tool_calls |
only on assistant |
Ordered list of ToolCall records per llm-provider §3 (top-level message field — NOT a content-block type). |
tool_call_id |
required on tool |
The id of the matching assistant tool call. |
Per-role constraints, validation timing, and ToolCall shape follow llm-provider §3 verbatim.
Tool calls are top-level message fields (assistant messages with tool_calls populated); tool
results are tool-role messages with tool_call_id and string content. The chat sub-spec
adds no message-shape divergence — chat messages ARE llm-provider messages.
Cross-cutting alignment. A ChatMessage flowing through the chat harness MAY be passed
directly to an llm-provider complete() call without translation. A chat-prompt template rendered
per prompt-management §3.1 produces content blocks that compose into a ChatMessage without
adaptation. The one canonical message shape across the spec is enforced at the chat boundary.
3.1 Message validation¶
Implementations MUST validate inbound ChatMessage instances at the send() API boundary
before any session load or invoke dispatch. Validation MUST check:
roleis one of the four enumerated values.- The per-role content shape constraints from llm-provider §3 hold (e.g., assistant
contentMAY be empty whentool_callsis non-empty;toolrole requires a non-emptytool_call_id). - Content blocks (when present) are valid instances of one of the four llm-provider §3.1 types.
A ChatMessage failing validation MUST surface as a chat_message_shape_invalid error (per
§10) — this is a user-correctable error per harness §7.3 (the caller supplied a malformed
message).
4. Conversation history convention¶
Sessioned chat graphs store conversation history in a canonical session-state field:
- Field name.
messages(lowercase, plural). The chat harness reads + writes through this exact field name. - Field type.
list[ChatMessage], ordered. - Reducer.
append. Each turn appends one or more new messages to the tail.
Canonical name (v1). v1 MANDATES the canonical field name. A follow-on proposal MAY add a
field-name configurability option if real applications surface naming conflicts, but the v1
contract keeps the surface tight (one canonical name; zero-config against any sessioned graph
using it). Applications with pre-existing state fields named messages whose semantics
conflict MUST migrate before adopting the chat harness, or wait for the field-name configurability
follow-on if it lands.
Scoping. History is per-session. Cross-session memory (per-user profiles, episodic recall,
semantic-indexed retrieval) is OUT of scope per §13 — the messages field carries only the
conversational turns within one session_id.
Reducer composition. The chat harness does NOT register a reducer on its own; the application's
graph state schema declares messages: list[ChatMessage] with the append reducer per
graph-engine's canonical state-reducer conventions. Implementations MAY ship a helper that
defines this field on a state schema for convenience (a "ChatStateMixin" or equivalent
per-language idiom), but the spec contract requires only that a sessioned chat graph carries the
correctly-shaped field; the construction mechanism is per-language.
5. The send() callable¶
The chat harness exposes a higher-level callable surface above raw invoke():
Spec-normative identifier. send is the spec-normative method name. Implementations MAY
adjust casing per language idiom (send in Python / Rust / Go; Send in C# / Pascal-case
contexts) while keeping the identifier recognizable. Asynchronous variants follow the language's
async idiom (Python async def send(...), TypeScript async send(...) returning a Promise,
etc.) — the callable contract is the same.
5.1 Parameter semantics¶
session_id— the session identifier per sessions §3. Implementations MUST validate it is non-empty before any session load.message— the inboundChatMessageper §3. MUST be validated per §3.1 before any session load.
5.2 Return type — ChatTurnOutcome¶
ChatTurnOutcome is a discriminated outcome carrying one of three variants:
completed— the turn returned a normal completion. Carries:replies: list[ChatMessage]— the new messages added to history during the turn, in graph-execution order. Always a list (length 1 in the common single-message case; longer for tool-loop agents that emit assistant-tool-call → tool-role → assistant-final sequences). The list MUST contain only messages newly added during this turn — pre-existing history MUST NOT surface here.final_state(OPTIONAL) — the final state of the session after the turn. Implementations MAY expose this for callers that need to read other state fields the chat graph populated.errored— the turn errored at the harness boundary. Carries:error_bucket— one of harness §7's three buckets:session_terminating/retryable_transient/user_correctable. (Matches the harness conformance vocabulary'serror_bucketfield for the three-way classification.)error_category(OPTIONAL) — the underlying concrete error category from harness §10 / sessions §10 / suspension §10 / llm-provider §7 / etc. (e.g.,provider_unavailable,session_load_failed,chat_message_shape_invalid). Matches the harness vocabulary'serror_categoryfor the concrete error name.reply: ChatMessage— a system-shaped reply per §10's mapping; the application's chat UI surfaces this directly to the user.suspended— the turn calledsuspend()mid-invocation. Carries:signal_descriptor— the suspension descriptor per suspension §4 (signal id, metadata).pending_messages: list[ChatMessage]— messages the graph appended tostate.messagesbefore reaching the suspending node (per §8.1), in graph-execution order. Empty list when the graph suspended without pre-emitting any pending message; one or more entries when it did. Always a list (matchesChatTurnOutcome.completed.replies's always-list shape; same rationale per the proposal's Open Questions resolution).invocation_id— the paused invocation's identifier per harness §3.3 (needed by the caller when the resume-via-listener pattern is not used andinvoke(resume_invocation=...)is called explicitly).
Discriminator mechanism is per-language. Implementations MAY use a tagged-union shape (Python discriminated dataclass union; TypeScript discriminated interface union), a sealed-class hierarchy (Kotlin / Scala), a result-type with three variants (Rust enum), or any equivalent shape the language idiomatically expresses discriminated outcomes through. The field set above is normative; the discriminator's surface is not.
5.3 Concurrency¶
Concurrent send() calls under the same session_id MUST serialize. The second call awaits
the first's outcome before beginning its own session load + invoke dispatch.
The chat sub-spec layers this stricter rule on top of sessions §8.1's last-write-wins concurrency default because chat-shaped concurrent interleaving is particularly user-visible — two user messages arriving in rapid succession could otherwise produce out-of-order replies (the second turn reads pre-first-turn history) and reply-interleaving (both invocations writing assistant messages concurrently).
The serialization happens at the chat-harness layer: a per-session_id lock around the send()
body. No change is required to the abstract sessions capability. Applications wanting parallel
turns under one logical user construct multiple sessions.
Concurrent send() calls under DIFFERENT session_id values MAY proceed in parallel; the
serialization rule scopes to a single session_id.
6. Inbound message → session → invoke wiring¶
The chat harness implements harness §3.1 / §3.2 / §3.3 in chat-shaped form. Path classification at
send() entry follows these rules:
- Harness §3.1 (new-session) path. The session's
messagesfield is empty (either the session record does not exist yet or its history is empty). The chat harness loads (or initializes) the session, appends the inbound user message to history, and invokes the graph. - Harness §3.2 (existing-active-session) path. The session's
messagesfield is non-empty. The chat harness loads the session, appends the inbound user message to the existing history, and invokes the graph. - Harness §3.3 (signal-resume) path. Inbound traffic carries a resume signal (per harness §6's
signal coordinator). The chat harness dispatches the signal-resume invocation as harness §3.3
specifies;
send()is NOT the entry point for signal-resume (subscribed listeners per §8 are). An inboundsend()call MUST NOT be classified as signal-resume.
Initial-state construction. For both §3.1 and §3.2 paths, the chat harness constructs the graph's initial state by:
- Loading the session state per sessions §4.1 (full-state load on existing sessions; default initial state on new sessions).
- Appending the inbound user message to the
messagesfield via theappendreducer. - Threading the resulting state into
invoke()along with thesession_id.
The invoke call follows the harness §4 turn lifecycle (start, body, terminal). The chat harness
does NOT mutate any session-state field other than messages during initial-state construction;
graph-internal nodes are responsible for any further state evolution.
7. Outbound assistant message → response wiring¶
After invoke() returns completed, the chat harness extracts the new messages from the final
state's messages field and surfaces them on the ChatTurnOutcome.completed.replies list.
New-message extraction. "New messages" means messages appended during this turn — the tail
of the post-invoke messages list that was NOT present in the pre-invoke loaded state. The chat
harness MUST extract by index (track the pre-invoke history length; everything at that index and
beyond is new), NOT by content matching (content matching is brittle and would misclassify a
node that appends a message identical to one earlier in history).
Ordering. Multi-message replies preserve graph-execution order. The messages field's append
reducer composes appends in graph-execution order naturally; the chat harness MUST NOT re-order
the extracted tail. Determinism follows harness §11.
All-roles inclusion. The extracted tail includes messages of ALL roles — assistant text
replies, assistant messages with tool_calls, tool-role messages with tool_call_id + string
content. Tool-loop agents typically emit a three-message sequence within one invocation
(assistant with tool_calls → tool-role with the result → assistant with the final text reply
per llm-provider §3's canonical shape); all three surface on replies. Chat UIs decide how to
render tool-role intermediates (typically with "thinking" / "calling X" / "got result"
affordances); the spec contract does not constrain rendering.
Empty tail. A turn that adds NO new messages (e.g., a graph that handles the user message
purely via side effect and returns the state unchanged) surfaces as a ChatTurnOutcome.completed
with replies = [] (empty list). Implementations MUST surface this case as completed (NOT
errored); the chat UI decides whether to expose the empty reply to the user (display a "no
response" affordance, retry, etc.).
8. Composition with suspension¶
When invoke() returns suspended (per harness §5.3), the chat harness returns a
ChatTurnOutcome.suspended carrying the signal descriptor, optional pending message, and the
paused invocation's id (per §5.2).
8.1 Pending message protocol¶
The graph MAY include an upstream node that returns a partial update appending a synthetic
assistant message to the messages state field via the standard append reducer; a
downstream node then calls suspend(). (The two are separate nodes because the
graph-engine node-body contract has a node either return a partial update OR call suspend(),
not both — suspend() raises before any return value is captured. The two-node composition
expresses "compose pending message, then suspend" as a graph topology.)
The synthetic messages typically inform the user the turn is awaiting an external signal —
"I'm waiting for approval to send this email," "Approve transferring $500 to Bob's account?",
etc. The chat harness MUST extract the pending-message tail from the suspended outcome's
state.messages (the same extraction rule from §7; the tail past the pre-invoke history
length) and surface it on the ChatTurnOutcome.suspended.pending_messages list. The engine
persists the post-update state at suspend time per suspension §4, so the upstream node's
appended messages are in the captured state by the time the chat harness reads the tail.
Always-list shape. pending_messages is always a list[ChatMessage] — empty when the
graph suspended without pre-emitting any message, one or more entries (in graph-execution
order) when it did. The always-list shape matches the OQ-6 resolution for
ChatTurnOutcome.completed.replies; per-language singular-or-list ergonomics is not a
spec-level concern, the shape is fixed.
No chat-specific engine hook. The pending-message pattern composes from existing graph
primitives — partial-update + edge + suspend(). The chat harness extracts the appended
messages via the same tail-extraction rule §7 uses for completed replies. Implementations
MUST NOT introduce a chat-specific engine hook (chat.emit_pending() or equivalent) for this
purpose — the graph-primitive composition is the canonical pattern.
Multi-message pending tails. Multiple synthetic messages MAY appear in the tail (e.g., an
assistant text-explanation message AND an assistant tool-call message describing the action
awaiting approval, then suspend()). All surface on pending_messages in graph-execution
order.
8.2 Signal-resume flow¶
When the resume signal arrives (per harness §6's signal coordinator):
- The chat harness dispatches the signal-resume invocation per harness §3.3 (
invoke(resume_invocation=<paused_id>, signal_payload=<payload>)). - The resumed invocation continues from the post-suspend node per suspension §7.
- The post-resume turn's new messages (extracted per §7's tail rule) flow to the caller via a subscribed listener the caller registered against the chat harness.
Subscribed-listener primitive. The chat harness exposes a subscription primitive whose
per-language shape is implementation-defined (Python: harness.subscribe(session_id, callback)
returning a subscription handle; TypeScript: harness.subscribe(session_id, callback) returning
an unsubscribe function; equivalent in other languages). The subscription's callback fires
exactly once per resumed turn with the post-resume ChatTurnOutcome.completed (or
ChatTurnOutcome.errored if the resumed invocation errors).
Listener-default rationale. Chat UX is fundamentally asynchronous post-suspend — the user
should NOT have to send another message to receive the resume reply. A synchronous "next send()
returns the resume reply" shape would force the user to manually trigger reply delivery, which
contradicts the conversational UX expectation. The subscribed listener is therefore the default
path.
Optional synchronous-next-send() alternative. Implementations MAY ALSO support a shape where
the NEXT send() call on the same session_id (after the resume signal arrives) returns the
post-resume ChatTurnOutcome ahead of processing the next user message. Useful for
synchronous-style integrations (CLI agents, batch processors) that don't have a long-lived
subscription path. The subscribed-listener path MUST be available; the synchronous alternative is
OPTIONAL.
Listener contract. The listener callback fires after the post-resume invoke() returns,
with the same ChatTurnOutcome shape send() would return for a normal completed or errored
turn. The listener does NOT fire on the original suspending turn (which send() already returned
for); it fires only on the post-resume completion.
9. Composition with streaming (forward-looking)¶
The chat sub-spec defines the API-surface contract for a streaming variant of send() without
prescribing the engine-side hook implementation:
The exact streaming return shape — whether the iterator yields the final ChatTurnOutcome as
its last value, returns a tuple (<async-iterator-of-chunks>, <awaitable-final-outcome>), or
uses a different shape — is deferred to the planned streaming proposal so the chat sub-spec
doesn't lock in a return shape before the streaming proposal can validate it against the
engine-side hook surface.
v1 ships non-streaming only. The non-streaming send() per §5 is the only callable v1
mandates. send_streaming() becomes mandatory when the streaming proposal lands and ratifies
the engine-side hook surface; until then, implementations MAY ship a non-conformant streaming
variant (or skip streaming entirely) without violating the chat-harness contract.
Forward-compatibility. This section defines only the method name and parameter semantics
today (send_streaming(session_id, message)); the return shape — including any discriminator
between chunks and the final outcome — is the streaming proposal's contract surface. The split
keeps the chat sub-spec from over-claiming a return type that hasn't been validated against the
engine-side hooks yet. The streaming proposal will fill in both the return shape and the engine
hooks (token_chunk / tool_call_progress / equivalent observer events) the streaming variant
consumes. Both compose when streaming lands; no v2 chat sub-spec revision is needed.
10. Error → user-facing reply mapping¶
Each of harness §7's three error buckets maps to a ChatTurnOutcome.errored shape with a
system-message-style reply:
10.1 Session-terminating errors (harness §7.1)¶
Categories: session_load_failed, session_save_failed, session_state_migration_chain_ambiguous
(sessions §10), suspension_persistence_failed (suspension §10), harness_session_id_unresolved
(harness §10).
Mapping: ChatTurnOutcome.errored with error_bucket = "session_terminating" and a reply
shaped:
"This conversation can't continue. Please start a new one."
The chat UI MAY refuse further send() calls on this session_id. Implementations MAY customize
the reply text (i18n, brand voice); the bucket semantics — conversation cannot continue, user
should start a new session — MUST be preserved.
10.2 Retryable transient errors (harness §7.2)¶
Categories: provider_unavailable, provider_timeout, provider_rate_limited (llm-provider §7),
network-layer transient failures the underlying transport surfaces, observer-side transients.
Mapping: ChatTurnOutcome.errored with error_bucket = "retryable_transient" and a reply
shaped:
"I had trouble responding. Try again in a moment."
The chat UI MAY auto-retry once per the application's retry policy. Implementations MAY customize the reply text; the bucket semantics — transient failure, retry is appropriate — MUST be preserved.
10.3 User-correctable errors (harness §7.3)¶
Categories: provider_invalid_request, provider_invalid_response,
chat_message_shape_invalid (this sub-spec; see below), other user-input-shape failures.
Mapping: ChatTurnOutcome.errored with error_bucket = "user_correctable" and a reply
incorporating the upstream error's diagnostic detail, shaped:
"That request couldn't be processed:
. Please adjust your message and try again."
The chat UI invites the user to retry with corrected input. Implementations MUST surface enough of the underlying error's diagnostic message that the user can act on it; over-redaction ("an error occurred") defeats the user-correctable bucket's purpose.
10.4 New error category: chat_message_shape_invalid¶
Introduced by this capability. Surfaces when the inbound ChatMessage at send() entry fails
the §3.1 validation (missing required fields, unsupported content-block types, role-content
shape mismatches, malformed tool_call_id references). Routes through harness §7.3 (user-
correctable) on surfacing.
No change to harness §10's abstract error set; the new category lives in the chat sub-spec layer and maps to a §7.3 bucket entry on the abstract surface.
11. Determinism¶
The chat sub-spec inherits harness §11's determinism rules unchanged. The send() callable adds
no nondeterminism beyond what the underlying graph + harness layers introduce. The
per-session_id serialization rule from §5.3 is a determinism-preserving constraint at the
chat-harness boundary; without it, concurrent send() calls under one session could produce
non-deterministic message-interleaving in history.
12. Cross-spec touchpoints¶
- harness (
spec/harness/spec.md) — abstract foundation; this sub-spec layers chat-shaped surfaces over harness's contract. Specifically: §3 (dispatch paths), §4 (turn lifecycle), §5 (outcome handling), §7 (error categorization), §8 (composition with sessions / suspension / checkpointing / observability). - sessions (
spec/sessions/spec.md) — sessioned-mode chat threadssession_idper sessions §3; conversation history persists in session state. - suspension (
spec/suspension/spec.md) — chat composes with suspension via §8 above (pending-message protocol + subscribed-listener resume). - prompt-management (
spec/prompt-management/spec.md) — chat-prompt templates render into messages chat harnesses consume; content-block model shared with llm-provider §3.1. - llm-provider (
spec/llm-provider/spec.md) —ChatMessagemirrors llm-provider §3 unchanged; chat-harness messages pass through tocomplete()calls without translation. - observability (
spec/observability/spec.md) — turn-level wrapper spans per harness §4.6; no chat-specific span attributes introduced. - graph-engine (
spec/graph-engine/spec.md) — chat graphs are graph-engine graphs; themessages: list[ChatMessage]field follows canonical state-reducer conventions. - future memory capability — cross-session memory (per-user profiles, episodic memory, vector retrieval) deferred to a future sub-spec; chat composes with it when its spec lands.
13. Out of scope¶
- Cross-session memory. Per-user profiles, episodic memory, semantic / vector-indexed retrieval. Per-session conversation history is in scope; anything spanning sessions is OUT. Deferred to a future memory capability that builds on this spec.
- Specific chat-platform adapters. Slack, Discord, Microsoft Teams, WhatsApp, customer-platform integrations. Each platform's message routing (mentions, threads, channels, attachments) is the platform-adapter package's concern; the chat sub-spec ratifies what the platform-adapters build on.
- Multimodal output beyond text/image. Audio, video, file outputs deferred to future
prompt-management or llm-provider proposals that extend the content-block set; this sub-spec
inherits additions automatically (the
ContentBlockreference is unscoped). - Voice input / output. A voice agent harness is a different harness type entirely (streaming-first, low-latency, with voice-activity-detection turn boundaries). Future sub-spec if real.
- Rich-media compose UI. What the chat UI looks like, how the user composes messages, how attachments are picked — application concern.
- Conversation summary / context-window management. Application strategies (summarization,
sliding-window, vector retrieval over old turns) for managing long conversations against LLM
context-window limits. The chat sub-spec exposes the raw
messagesfield; the management strategy lives in application code or a future memory-adjacent capability. - Per-turn user-facing error tone customization. The §10 mapping defines a three-bucket default; applications that want different tone (more / less verbose, branded language, localized) wrap the chat harness with their own error-mapping layer above the chat-harness surface.
- Streaming protocol details. The chat sub-spec defines the API-surface contract for the
streaming
send_streaming()variant; the actual streaming protocol (token format, chunk boundaries, backpressure, cancellation) lives in the planned streaming proposal. - Stateless completion harness. "Send one message, get one reply, no memory" is a different harness type; if anyone wants it, it gets its own sub-spec ("harness-completion" or similar).