daf07fd759
Add the async job submission API, webhook state notifications, and
artifact serving endpoints on top of the Phase 3 queue infrastructure.
Key changes:
- POST /jobs: async job submission with 202 + job_id ULID; optional
state_webhook_url for push notifications on state transitions
- GET /jobs/{id}: job status polling with result, error, and artifact
metadata; artifacts <= 256KB inlined, larger ones by URL reference
- GET /jobs/{id}/artifacts/{name}: raw artifact data serving
- Webhook dispatcher: at-least-once delivery with exponential backoff
(5 retries); optional HMAC-SHA256 signing (X-Foreman-Signature)
- ADR-0014: state_webhook_url only honored on POST /jobs, not sync
/api/chat (caller already blocks for result)
- Comprehensive tests for /jobs lifecycle, webhook delivery, HMAC
verification, artifact inline/URL threshold, and TTL pruning
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
44 lines
2.1 KiB
Markdown
44 lines
2.1 KiB
Markdown
# foreman — Architecture Decision Records
|
|
|
|
`foreman` is a small daemon that fronts **one** Ollama target. It turns a single
|
|
Ollama instance into a queued, observable job endpoint: it polls the target's
|
|
installed models, serializes jobs through the target (managing model swaps),
|
|
assigns every job an ID, and reports progress + artifacts via webhooks. It also
|
|
ships a Go client so the target is trivial to use from `go-llm`.
|
|
|
|
It is the deliberately pared-down successor to `peon-overseer`. One daemon, one
|
|
worker, one queue. No distributed dispatch, no leases, no fair queueing.
|
|
|
|
## Index
|
|
|
|
| ADR | Title | Status |
|
|
|-----|-------|--------|
|
|
| 0001 | One daemon per Ollama target | Accepted |
|
|
| 0002 | Daemon placement and remote target configuration | Accepted |
|
|
| 0003 | API surface: native Ollama passthrough vs OpenAI-compat | Accepted |
|
|
| 0004 | Async job surface, job IDs, and queued execution | Accepted |
|
|
| 0005 | Webhook state-update protocol | Accepted |
|
|
| 0006 | Artifact handling and transport | Accepted |
|
|
| 0007 | Model inventory polling and discovery | Accepted |
|
|
| 0008 | Durable SQLite-backed queue | Accepted |
|
|
| 0009 | Single-worker serialization and drain-by-model scheduling | Accepted |
|
|
| 0010 | Authentication and security boundary | Accepted |
|
|
| 0011 | Go client library and go-llm integration | Accepted |
|
|
| 0012 | Streaming support | Accepted |
|
|
| 0013 | Two-slot residency and embedding bypass | Accepted |
|
|
| 0014 | No webhooks on synchronous /api/chat | Accepted |
|
|
|
|
ADR-0003 was resolved in favor of **native Ollama** as the v1 surface: foreman is,
|
|
on the wire, a private authenticated Ollama deployment, so `go-llm` integrates via
|
|
a thin `llm.Foreman(baseURL, token)` constructor that delegates to the existing
|
|
ollama provider (ADR-0011). OpenAI-compat `/v1` is deferred.
|
|
|
|
These ADRs refine the API/integration sections of the project `CLAUDE.md`. The
|
|
queue, single-worker, drain-by-model, and security guardrails carry forward
|
|
unchanged.
|
|
|
|
## Format
|
|
|
|
Each ADR: Status, Context, Decision, Consequences, and Alternatives where useful.
|
|
One decision per file. Append new ADRs; supersede rather than rewrite.
|