0072: Prompt Management — Per-Fetch Cache-TTL Control¶
- Status: Accepted
- Author: Chris Colinsky
- Created: 2026-06-16
- Accepted: 2026-06-17
- Targets: spec/prompt-management/spec.md (add an optional
cache_ttl_secondsparameter to the §5PromptBackend.fetchcontract and to the §6PromptManager.fetch/getoperations; amend the §5 backend-caching paragraph so the per-fetch TTL is a defined caller lever; add a clarifying note to §15). spec/conformance-adapter/spec.md (add a caching prompt-backend harness primitive to §6 so a fixture can assert the force-fresh behavior). - Related: 0047 (last prompt-management change — prefix-cache authoring guidance in §13 / §14; note this proposal concerns the backend template-fetch cache, a different cache from the user-level result cache §15 defers), 0055 (conformance-adapter capability — the harness-primitive touchpoint).
- Supersedes:
Summary¶
A PromptBackend MAY maintain a client-side cache of fetched templates (§5 already says so: "Backends MAY cache their own results internally … cache invalidation is implementation-defined"). Today a caller has no way to influence that cache per fetch — neither PromptBackend.fetch (§5) nor PromptManager.fetch / get (§6) exposes any cache control — so a long-lived process is stuck with whatever default TTL the backend's cache uses. This proposal threads an optional cache_ttl_seconds through those operations: absent/None preserves today's behavior; 0 forces a fresh fetch (bypassing any cached entry); a positive value bounds how stale a served entry may be. Backends that keep no client-side cache treat it as a no-op. The change is additive and fully backward-compatible.
Motivation¶
A process that holds a PromptManager for its lifetime and fetches through a caching backend cannot observe a republished prompt until the backend's cache TTL lapses. The common need is on-demand refresh — an operational path that should pick up a newly published prompt version without restarting the process (for example, to recompute derived state keyed by the fetched version the moment a new version is published). With no per-fetch cache control, an in-process "refresh" re-reads the same cached version and is a no-op until the TTL expires.
§5 deliberately leaves cache invalidation implementation-defined, and §15 defers user-level result-cache invalidation. But the narrow lever this needs — "for this fetch, don't serve a stale cached template" — is a property of the fetch contract itself, not a separate caching subsystem. The SDK-backed and HTTP-backed backends that maintain such a cache already support a per-call TTL knob internally; the gap is purely that OA's fetch surface does not expose one, so the knob is unreachable from PromptManager.
Detailed design¶
Anticipated bump: MINOR (pre-1.0); concrete version assigned at acceptance. The change adds one optional parameter to the fetch surface (additive, backward-compatible) and a conformance harness primitive.
§5 — PromptBackend.fetch gains cache_ttl_seconds¶
Extend the fetch contract with an optional parameter:
fetch(name, label="production", cache_ttl_seconds=None)¶
name— string. Required. (unchanged)label— string. Default"production". (unchanged)cache_ttl_seconds— optional non-negative integer, default absent /None. Bounds the staleness of a cached template the backend MAY serve for this fetch, for backends that maintain a client-side cache:- absent /
None— the backend's own caching behavior governs (current behavior; fully backward-compatible).0— the backend MUST NOT serve a cached entry; it fetches fresh from the source.N > 0— the backend MUST NOT serve a cached entry older thanNseconds; an entry younger thanNseconds MAY be served, otherwise the backend fetches fresh.Negative values are invalid; implementations MUST reject a negative
cache_ttl_seconds(raised per the language's idiom for an invalid argument).
cache_ttl_secondsgoverns only which cached entry MAY be served for this fetch; whether the fetched result is then written to the backend's cache, and for how long, remains the backend's implementation-defined cache management (below). A0fetch therefore guarantees a fresh read — not that subsequent default-TTL fetches observe the new version.The condition is behavioral, not a backend type: a backend that reads its source on every fetch — i.e., maintains no client-side cache (typical of filesystem and in-memory backends) — treats
cache_ttl_secondsas a no-op, since it already returns a fresh read each call. A backend that does maintain a client-side cache honors the parameter regardless of its storage. The parameter is part of the contract so all backends share one signature; cacheless backends accept and ignore it.
Amend the existing backend-caching paragraph (§5) so the TTL is a defined caller lever rather than wholly implementation-defined:
Backends MAY cache their own results internally (e.g., a managed-registry backend caching by
(name, label)for some TTL). When a caller suppliescache_ttl_seconds, it bounds the staleness of any cached entry the backend MAY serve for that fetch (above); other aspects of cache management — whether a fetched result is written to the cache, eviction, sizing, cross-process invalidation — remain implementation-defined. When a backend serves a cached result, the returnedPrompt'stemplate_hashMUST still be correct for the served template, andfetched_atMUST reflect the original fetch time, not the cache-hit time (unchanged — caching MUST NOT break content-addressing).
§6 — PromptManager.fetch / get thread it through¶
fetch(name, label=None, cache_ttl_seconds=None)¶Async. (Label resolution unchanged.) When
cache_ttl_secondsis supplied, the manager passes it verbatim to every backendfetch(name, label, cache_ttl_seconds)call it makes while walking the §9 fallback chain — so a0(force-fresh) applies to whichever backend ultimately serves the prompt. Default absent /None(current behavior).
get(name, label=None, variables=None, cache_ttl_seconds=None)¶Async. Convenience equivalent to
render(await fetch(name, label, cache_ttl_seconds), variables). The parameter governs the fetch leg only.
render (§6) is unchanged: it is a local transformation over an already-fetched Prompt and
performs no I/O, so a cache-control parameter has no meaning there.
§15 — clarifying note¶
Add a sentence to the §15 Cache invalidation policies bullet distinguishing the two caches: the
per-fetch backend-template cache control is now provided by §5 / §6 cache_ttl_seconds; what
remains out of scope is user-level result-cache invalidation (the caller's own cache keyed by
template_hash / rendered_hash) and any cross-process or eviction-policy machinery.
Determinism¶
cache_ttl_seconds affects only which version of an external template a fetch returns (an I/O
concern over backend state); it does not affect the deterministic render of a given fetched
template. §13 is unaffected.
Conformance test impact¶
A new fixture under spec/prompt-management/conformance/ exercises the force-fresh path via the
backend's source-read count, which needs no source mutation: two sequential fetches of the same
(name, label) with default cache control perform one source read (the second is served from
cache); two fetches with cache_ttl_seconds=0 perform two source reads (each bypasses the
cache). The assertion is on the read count, not on timing or template contents.
This requires a caching prompt-backend harness primitive (conformance-adapter §6): an in-memory
backend that caches by (name, label), counts source reads, and honors cache_ttl_seconds
(0 bypasses; None serves cached). The N > 0 max-age path additionally needs a controllable
clock and is exercised by a clock-controlled fixture variant (a served entry reused within N,
re-read past it); the 0-vs-default read-count path is the core asserted behavior. Non-caching
backends' no-op handling is covered by the existing filesystem / in-memory fetch fixtures
continuing to pass with the parameter supplied.
Versioning¶
MINOR bump (pre-1.0): §5 / §6 gain an optional parameter; behavior is additive and the default preserves current semantics exactly. No existing fetch / render / fallback behavior changes. The concrete version is the maintainer's call at acceptance.
Out of scope¶
- Cache eviction, sizing, and cross-process invalidation —
cache_ttl_secondsis a per-fetch staleness bound, not a cache-management API. How a backend evicts, sizes, or invalidates across processes remains implementation-defined (§5). - User-level result-cache invalidation — the caller's own cache of rendered results keyed by
template_hash/rendered_hashstays out of scope per §15; this proposal concerns only the backend's template-fetch cache. - A manager-level or construction-time default TTL — the need is per-fetch granularity (a specific refresh call forces fresh while ordinary calls use the default). A global default is a possible future convenience but is not required here.
- A cache-clear / purge method on the manager or backend — bypassing on the fetch path
(
cache_ttl_seconds=0) covers the on-demand-refresh need without a separate mutating operation that would not compose with the §9 fallback fetch path.
Alternatives considered¶
- A
force_refresh: boolflag instead ofcache_ttl_seconds. Rejected: strictly less expressive — it can bypass the cache but cannot shorten staleness (the "I want ≤ N-second staleness without re-fetching on every call" case).cache_ttl_secondscovers both (0= bypass,N= bounded staleness) with one parameter and reads naturally for backends whose client-side cache is already TTL-based. - A cache-clear method (e.g.,
manager.invalidate(name)). Rejected: a mutating side operation that does not compose with the fallback fetch path and forces callers to sequence an explicit invalidate before each refreshing fetch; the per-fetch bound is simpler and atomic. - Construction-time TTL only (configure the backend's TTL at build time). Rejected: it cannot express "this one refresh call must be fresh" without rebuilding the backend; the need is per-fetch.
- Leave it implementation-defined (status quo — backends cache, callers cannot influence it).
Rejected: it leaves on-demand refresh impossible through the
PromptManagersurface, which is the whole point of the request.