ADR-0006: Artifact handling and transport

Status: Accepted — 2026-05-23

Context

Jobs must "transmit artifacts when done." For a chat completion the obvious artifact is the assistant's text/tool-call output, but the term is deliberately broader: a job may produce structured data, multiple named outputs, or content too large to embed comfortably in a webhook body.

Decision

An artifact is a named, typed blob attached to a completed job:

{ "name": "completion", "content_type": "application/json", "size": 1234,
  "inline": { ... }, "url": null }

The primary completion is always emitted as an artifact named completion (the native-Ollama response shape), so there is one consistent access pattern.
Additional artifacts use distinct names.

Transport: inline vs fetch

Small artifacts (under a configurable threshold, default ~256 KB) are delivered inline in the done webhook (inline populated, url null) and in GET /jobs/{id}.
Large artifacts exceed the threshold: the webhook/GET carries metadata plus a url (GET /jobs/{id}/artifacts/{name}), and the bytes are fetched on demand. This keeps webhook payloads bounded and avoids shipping megabytes through a callback POST.

Retention

Artifacts are stored alongside the job in SQLite (ADR-0008) and pruned with the job after a configurable TTL. No separate blob store in v1; revisit only if artifact sizes outgrow SQLite comfort (single-digit MB).

Consequences

One uniform way to read output (completion artifact), extensible to richer jobs later without protocol changes.
Webhook bodies stay small; large outputs don't bloat or break delivery.
A pull endpoint for artifacts means a missed/oversized webhook never loses data.

Alternatives considered

Always inline. Simple but risks huge webhook bodies and SQLite row bloat in the hot path. Rejected.
External object store (S3/MinIO) from day one. Over-engineered for the expected sizes; deferred behind the TTL/threshold knobs.

2.0 KiB Raw Permalink Blame History