initial commit
This commit is contained in:
@@ -0,0 +1,52 @@
|
||||
# ADR-0004: Async job surface, job IDs, and queued execution
|
||||
|
||||
**Status:** Accepted — 2026-05-23
|
||||
|
||||
## Context
|
||||
|
||||
The transparent passthrough (ADR-0003) is synchronous: the caller holds an HTTP
|
||||
connection until the completion returns. That is fine for interactive-length work
|
||||
and for go-llm, but two needs aren't served by it:
|
||||
|
||||
- Long-running jobs held open through Traefik risk idle-connection timeouts.
|
||||
- Orchestration callers (mort/ratchet/werk-style) want fire-and-forget: submit,
|
||||
get an ID back immediately, and be told asynchronously when the work is done.
|
||||
|
||||
## Decision
|
||||
|
||||
Add a distinct async surface: `POST /jobs`.
|
||||
|
||||
- The body carries a chat payload (native-Ollama-shaped, mirroring `/api/chat`)
|
||||
plus optional extension fields, notably `state_webhook_url` (ADR-0005).
|
||||
- foreman enqueues the job, assigns it a **ULID** (sortable, timestamped), and
|
||||
immediately returns `202 Accepted` with `{ "job_id": "<ulid>" }`.
|
||||
- The caller correlates later webhook callbacks to its request via `job_id`.
|
||||
- `GET /jobs/{id}` returns current state, result, and artifact references for
|
||||
polling-style callers or for recovery after a missed webhook.
|
||||
|
||||
Every unit of work is a row in the queue (ADR-0008) regardless of which surface
|
||||
created it; the synchronous passthrough is simply a `/jobs` submission whose
|
||||
handler blocks on the job's completion instead of returning the ID.
|
||||
|
||||
### Job lifecycle
|
||||
|
||||
`queued → loading → working → done`, plus terminal `failed`. A job whose target
|
||||
is unreachable re-enters `queued` with a backoff (it is retryable, never
|
||||
auto-failed on a connection error — the target is a laptop, ADR-0002). A bounded
|
||||
retry count guards against poison jobs; exceeding it moves the job to `failed`
|
||||
with the last error recorded.
|
||||
|
||||
## Consequences
|
||||
|
||||
- One queue, one execution engine, two entry points (sync passthrough, async
|
||||
`/jobs`).
|
||||
- Job IDs are stable, sortable, and meaningful to correlate webhooks.
|
||||
- `GET /jobs/{id}` gives at-least-once webhook delivery a recovery path.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **Reuse the OpenAI response `id` field instead of a separate `/jobs` surface.**
|
||||
Workable for sync, but doesn't give async callers an immediate handle before
|
||||
completion. The explicit `/jobs` surface is clearer.
|
||||
- **UUIDv4 for IDs.** Rejected in favor of ULID for natural time-ordering in the
|
||||
queue and logs.
|
||||
Reference in New Issue
Block a user