Files
2026-05-23 16:51:19 -04:00

43 lines
2.3 KiB
Markdown

# phase-4.md — Async /jobs, job IDs, state webhooks, artifacts
Re-ground: `CLAUDE.md` + ADR-0004 (async surface), 0005 (webhook protocol), 0006
(artifacts). Plan, get approval, implement. **This phase delivers the headline
"queue & webhooks" capability — get it genuinely working end-to-end.**
## Objective
A fire-and-forget async surface over the Phase 3 engine: submit a job, get an ID
immediately, receive state updates and the final result/artifacts by webhook.
## Tasks
- `POST /jobs`: body is a native-chat payload plus optional `state_webhook_url`
(and optional HMAC secret usage per config). Enqueue (reusing the Phase 3
engine), return `202` with `{ "job_id": "<ulid>" }` immediately.
- `GET /jobs/{id}`: current state, result, error, and artifact metadata (the
recovery/poll path for missed webhooks).
- `internal/webhook`: on each state transition (`queued→loading→working→done`,
plus `failed`), POST the event JSON from ADR-0005 to `state_webhook_url`.
At-least-once with bounded retry + backoff; never let a flaky receiver block or
fail the job. If a webhook secret is configured, sign the body with
HMAC-SHA256 in `X-Foreman-Signature`.
- Artifacts (ADR-0006): persist the completion as artifact `completion`; deliver
artifacts inline in the `done` event under a size threshold (default ~256KB),
otherwise send metadata + a URL and serve bytes at
`GET /jobs/{id}/artifacts/{name}`. Add a TTL prune sweep for jobs+artifacts.
- Decide and record (new ADR if needed) whether `state_webhook_url` is also
honored when present on the sync `/api/chat` path, or only on `/jobs`.
- Tests: spin up an in-test webhook receiver; assert the full lifecycle fires in
order with correct payloads, idempotency keys (`job_id`+`state`) are present,
a 500-ing receiver triggers retries without affecting job state, large
artifacts go by URL and small ones inline, and HMAC signatures verify.
## Definition of done
- `go build/vet/test -race` green.
- End-to-end demo: `POST /jobs` with a `state_webhook_url` pointed at a tiny local
listener; observe `queued→loading→working→done` plus the completion artifact;
confirm `GET /jobs/{id}` reconciles after a deliberately dropped webhook.
Wrap up: `progress.md`, commit on `phase-4-async-webhooks`. M1 core is complete.