Allow configuring how long the worker model stays resident on the Ollama
target after a request via FOREMAN_KEEP_ALIVE env var. Accepts Ollama
duration strings ("-1" forever, "0" unload, "15m", "1h", etc). Defaults
to "-1" (pin forever). The embedder warm-up is unaffected and always
uses keep_alive=-1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test assumed webhook events arrive in wall-clock order (queued first,
done last), but dispatcher.Fire spawns a goroutine per event with no ordering
guarantee. On a single-core CI runner the "queued" goroutine was routinely
preempted before making its HTTP POST, letting "loading"/"working"/"done"
goroutines land first.
Fix: wait until a "done" event appears in the received set (proving all
prior transitions have been dispatched by the worker), then assert that
"queued" and "done" each appear exactly once rather than checking
positional order.
Reproduced with: GOMAXPROCS=1 go test -race -count=100 -run TestWebhook_LifecycleEvents ./internal/server/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the async job submission API, webhook state notifications, and
artifact serving endpoints on top of the Phase 3 queue infrastructure.
Key changes:
- POST /jobs: async job submission with 202 + job_id ULID; optional
state_webhook_url for push notifications on state transitions
- GET /jobs/{id}: job status polling with result, error, and artifact
metadata; artifacts <= 256KB inlined, larger ones by URL reference
- GET /jobs/{id}/artifacts/{name}: raw artifact data serving
- Webhook dispatcher: at-least-once delivery with exponential backoff
(5 retries); optional HMAC-SHA256 signing (X-Foreman-Signature)
- ADR-0014: state_webhook_url only honored on POST /jobs, not sync
/api/chat (caller already blocks for result)
- Comprehensive tests for /jobs lifecycle, webhook delivery, HMAC
verification, artifact inline/URL threshold, and TTL pruning
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Phase 2 in-flight chat gate (buffered channel) with a real
SQLite-backed job queue and single worker loop. Every /api/chat request
now creates a job row, blocks until the worker completes it, and returns
the result transparently.
Key changes:
- internal/store: NextJob (drain-by-model ordering), IncrementAttempt,
ResetInterruptedJobs, DeleteTerminalJobsBefore; busy_timeout pragma
- internal/worker: single-threaded worker loop with Notifier for sync
handler completion signaling; retry on ConnectionError, terminal fail
on HTTPError; crash recovery resets interrupted jobs on startup
- internal/webhook: dispatcher infrastructure for async webhook delivery
- internal/server: chat handler rewritten to enqueue+wait; old chatGate
removed; embeddings remain direct concurrent proxies (ADR-0013)
- internal/config: FOREMAN_MAX_ATTEMPTS, FOREMAN_JOB_TTL
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of foreman: the daemon now acts as a transparent Ollama proxy.
- internal/ollama: Client interface and HTTP implementation for chat
(streaming + non-streaming), embed, tags, ps with auth forwarding,
NDJSON streaming via bufio.Scanner, and connection vs HTTP error
classification via custom error types.
- internal/ollama: ModelInventory with background poller for /api/tags
and /api/ps, degraded mode on target unreachable with model retention,
automatic recovery on reconnect.
- internal/server: Passthrough routes (/api/chat, /api/tags, /api/ps,
/api/embed, /api/embeddings) with model validation, chat serialization
gate (capacity-1 channel), concurrent embedding bypass (ADR-0013),
NDJSON streaming with per-chunk flush, and degraded health reporting.
- cmd/foreman: Full serve wiring with Ollama client, poller goroutine,
embedder warmup (keep_alive:-1), and signal-based shutdown.
The Mac is now usable as a go-llm target through foreman.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of foreman: initialize the Go module, project layout, and core
infrastructure. Includes env-based configuration (FOREMAN_* namespace),
SQLite-backed durable job queue with WAL mode via modernc.org/sqlite,
stdlib HTTP server with /healthz and optional bearer-token auth middleware,
subcommand dispatch (serve + stubs), Gitea CI workflow, multi-stage
distroless Dockerfile, and comprehensive tests for all packages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>