Allow configuring how long the worker model stays resident on the Ollama
target after a request via FOREMAN_KEEP_ALIVE env var. Accepts Ollama
duration strings ("-1" forever, "0" unload, "15m", "1h", etc). Defaults
to "-1" (pin forever). The embedder warm-up is unaffected and always
uses keep_alive=-1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test assumed webhook events arrive in wall-clock order (queued first,
done last), but dispatcher.Fire spawns a goroutine per event with no ordering
guarantee. On a single-core CI runner the "queued" goroutine was routinely
preempted before making its HTTP POST, letting "loading"/"working"/"done"
goroutines land first.
Fix: wait until a "done" event appears in the received set (proving all
prior transitions have been dispatched by the worker), then assert that
"queued" and "done" each appear exactly once rather than checking
positional order.
Reproduced with: GOMAXPROCS=1 go test -race -count=100 -run TestWebhook_LifecycleEvents ./internal/server/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the async job submission API, webhook state notifications, and
artifact serving endpoints on top of the Phase 3 queue infrastructure.
Key changes:
- POST /jobs: async job submission with 202 + job_id ULID; optional
state_webhook_url for push notifications on state transitions
- GET /jobs/{id}: job status polling with result, error, and artifact
metadata; artifacts <= 256KB inlined, larger ones by URL reference
- GET /jobs/{id}/artifacts/{name}: raw artifact data serving
- Webhook dispatcher: at-least-once delivery with exponential backoff
(5 retries); optional HMAC-SHA256 signing (X-Foreman-Signature)
- ADR-0014: state_webhook_url only honored on POST /jobs, not sync
/api/chat (caller already blocks for result)
- Comprehensive tests for /jobs lifecycle, webhook delivery, HMAC
verification, artifact inline/URL threshold, and TTL pruning
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>