Commit Graph

24 Commits

Author SHA1 Message Date
steve a3e9982d49 refactor(v2/ollama): drop openaicompat shim, use native provider
The Ollama provider now targets /api/chat directly via the native provider
introduced in the previous commits. Public API is unchanged for callers
that go through llm.Ollama() (and is extended by Task 5's OllamaCloud()
constructor).

DefaultBaseURL was renamed to DefaultLocalBaseURL (without the trailing
/v1 segment used by the OpenAI-compat path). registry.go is updated
correspondingly; no other callers referenced the old name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 18:29:59 +00:00
steve f70c7c0842 feat(v2/ollama): implement native Stream() with NDJSON parsing
Reads Ollama's NDJSON stream (one JSON object per line) and emits
provider.StreamEvent values for text, thinking, tool-call start/delta/end,
and a final Done event carrying assembled Response and Usage. Uses
bufio.Scanner with a 4 MiB max-line buffer so multi-KB tool-call deltas
parse cleanly, and accepts tool-call arguments delivered either as
escaped string fragments (delta-style) or a complete JSON object
(one-shot).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 18:29:04 +00:00
steve 583f8724b2 feat(v2/ollama): implement native Complete() with tools, vision, thinking
Non-streaming /api/chat support including:
- Vision via images: []base64
- Tool calls on assistant + tool-role response messages
- think field accepting string reasoning levels (or "true"/"false")
- Authorization header when apiKey is non-empty (cloud mode)

Tool-call arguments are passed as JSON objects to the wire and surfaced
as JSON-string Arguments on provider.ToolCall. Tool calls are assigned
synthetic IDs (tc_<index>) when Ollama omits one, so the round-trip
back as an assistant tool_calls + tool-role message remains correlated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 18:24:02 +00:00
steve 0e358148eb feat(v2/ollama): scaffold native /api/chat provider
Adds wire types and a Provider struct that will replace the
openaicompat-based Ollama shim with a native /api/chat implementation.
Complete and Stream methods are stubs; subsequent commits implement them.

Adjusts the existing ollama.go to drop the type alias on
openaicompat.Provider (renaming the legacy shim to a temporary internal
helper) so the new native Provider type does not collide. Public New()
still returns the openaicompat-backed provider until Task 4 swaps it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 18:22:11 +00:00
steve 5c5d861915 fix(v2): coerce string-encoded numbers/bools in tool arguments
CI / Root Module (push) Failing after 30s
CI / Lint (push) Failing after 3s
CI / V2 Module (push) Successful in 1m54s
LLMs occasionally return numeric or boolean tool-call fields as JSON
strings (e.g. "3" instead of 3, "true" instead of true), which Go's
strict json.Unmarshal rejects. The strict unmarshal stays as the happy
path; on failure we retry with a coercion pass that walks the target
struct (recursing into nested structs, slices, maps, and pointer fields)
and converts strings to the appropriate kind. Returns the original error
if coercion can't recover.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 22:12:56 +00:00
steve cbaf41f50c feat(v2): add ReasoningLevel option; thinking/reasoning across providers
CI / Root Module (push) Failing after 1m30s
CI / Lint (push) Failing after 1m1s
CI / V2 Module (push) Successful in 3m41s
Introduces an opt-in level-based reasoning toggle (low/medium/high) that
each provider translates to its native parameter:

- Anthropic: thinking.budget_tokens (1024/8000/24000), with temperature
  forced to default and MaxTokens auto-grown above the budget.
- OpenAI/xAI/Groq via openaicompat: reasoning_effort string, gated by a
  new Rules.SupportsReasoning predicate so non-reasoning models don't
  receive the parameter. xAI uses Rules.MapReasoningEffort to remap
  "medium" to "high" since its API only accepts low|high.
- Google: thinking_config.thinking_budget + include_thoughts:true.
- DeepSeek: SupportsReasoning=false (reasoner is always-on; the
  reasoning_content trace was already extracted via openaicompat).

Reasoning content is surfaced as Response.Thinking on Complete and as
StreamEventThinking deltas during streaming. Provider-side: extracted
from Anthropic thinking content blocks, Google's part.Thought=true
parts, and the non-standard reasoning_content field that DeepSeek and
Groq emit (parsed out of raw JSON since openai-go doesn't type it).

Public API:
  - llm.ReasoningLevel + ReasoningLow/Medium/High constants
  - llm.WithReasoning(level) request option
  - Model.WithReasoning(level) for baked-in defaults
  - provider.Request.Reasoning, provider.Response.Thinking
  - provider.StreamEventThinking

Tests cover Rules-based gating, MapReasoningEffort, reasoning_content
extraction (Complete + Stream), Anthropic budget mapping, and
temperature suppression when thinking is enabled. Existing behavior is
unchanged when Reasoning is the empty string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:58:42 +00:00
steve 34119e5a00 feat: add DeepSeek, Moonshot, xAI, Groq, Ollama; drop v1; migrate TUI to v2
CI / Root Module (push) Failing after 30s
CI / Lint (push) Failing after 50s
CI / V2 Module (push) Successful in 2m14s
Five OpenAI-compatible providers join the library as first-class constructors
(llm.DeepSeek, llm.Moonshot, llm.XAI, llm.Groq, llm.Ollama). Their wire-level
implementation is shared via a new v2/openaicompat package which is the
extracted guts of the old v2/openai provider; each provider supplies its own
Rules value to declare per-model constraints (e.g., DeepSeek Reasoner rejects
tools and temperature, Moonshot/xAI accept images only on *-vision* models,
Groq rejects audio input). v2/openai itself becomes a thin wrapper that sets
RestrictTemperature for o-series and gpt-5 models.

A new provider registry (v2/registry.go) exposes llm.Providers() and drives
the TUI's provider picker so adding a provider in future is a single-file
change.

The TUI at cmd/llm was migrated from v1 to v2 and moved to v2/cmd/llm. With
nothing else depending on v1, the v1 code at the repo root (all .go files,
schema/, internal/, provider/, root go.mod/go.sum) is deleted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 13:34:39 +00:00
steve 9b91b2f794 test(v2): end-to-end cache-hint propagation through Chat.Send
CI / Lint (push) Successful in 9m35s
CI / Root Module (push) Successful in 10m54s
CI / V2 Module (push) Successful in 11m15s
Verifies that WithPromptCaching() on a Chat results in CacheHints being
set on the provider.Request that reaches the provider layer, and that
omitting the option leaves CacheHints nil (no behavior change for
existing callers).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:40:39 +00:00
steve 34b2e29019 feat(v2/anthropic): apply cache_control markers from CacheHints
buildRequest now tracks a source-index → built-message-index mapping
during the role-merge pass, then uses the mapping to attach
cache_control: {type: ephemeral} markers at the positions indicated by
Request.CacheHints. The last tool, the last system part, and the last
non-system message each get a marker when the corresponding hint is set.

Covers the merge-induced index drift that would otherwise cause the
breakpoint to land on the wrong content block when consecutive same-role
source messages are combined into a single Anthropic message with
multiple content blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:33:25 +00:00
steve 4c6dfb9058 test(v2/anthropic): drop placeholder import sentinel from cache_test.go
Removes the blank-assign workaround that was only needed because the
anth import was being kept alive for Task 5's use. Task 5 will bring
the import back when it actually references anth.CacheControlTypeEphemeral.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:29:14 +00:00
steve a6b5544674 refactor(v2/anthropic): use MultiSystem for system prompts
Switches buildRequest to emit anthReq.MultiSystem instead of anthReq.System
whenever a system message is present. Upstream's MarshalJSON prefers
MultiSystem when non-empty, so the wire format is unchanged for requests
without cache_control. This refactor is a prerequisite for attaching
cache_control markers to system parts in the next commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:26:55 +00:00
steve 01b18dcf32 test(v2): cover empty-messages and disabled-but-non-nil cacheConfig edges
Adds two boundary tests suggested by code review:
- TestBuildProviderRequest_CachingEnabled_EmptyMessages: verifies
  that caching with an empty message list still emits a CacheHints
  with LastCacheableMessageIndex=-1, not a spurious breakpoint.
- TestBuildProviderRequest_CachingNonNilButDisabled: verifies that
  an explicitly-disabled cacheConfig (non-nil, enabled=false)
  produces nil CacheHints, exercising the &&-guard branch that
  the previous "disabled" test left untested.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:24:55 +00:00
steve 4b401fcc0d feat(v2): populate CacheHints on provider.Request when caching enabled
CI / Lint (push) Successful in 9m36s
CI / Root Module (push) Successful in 10m55s
CI / V2 Module (push) Successful in 11m14s
buildProviderRequest now computes cache-breakpoint positions automatically
when the WithPromptCaching() option is set. It places up to 3 hints:
tools, system, and the index of the last non-system message. Providers
that don't support caching (OpenAI, Google) ignore the field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:22:00 +00:00
steve c4fe0026a2 feat(v2): add WithPromptCaching() request option
CI / Lint (push) Failing after 2m2s
CI / V2 Module (push) Failing after 2m3s
CI / Root Module (push) Has been cancelled
Introduces an opt-in RequestOption that callers can pass to enable
automatic prompt-caching markers. The option populates a cacheConfig
on requestConfig but has no effect yet — plumbing through to
provider.Request and on to the Anthropic provider lands in subsequent
commits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:17:55 +00:00
steve b4bf73136a feat(v2/provider): add CacheHints to Request for prompt caching
Adds an optional CacheHints field on provider.Request that carries
cache-breakpoint placement directives from the public llm package down
to individual provider implementations. Anthropic will consume these in
a follow-up commit; OpenAI and Google ignore them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 19:14:44 +00:00
steve 5b687839b2 feat: comprehensive token usage tracking for V2
CI / Lint (pull_request) Successful in 10m18s
CI / Root Module (pull_request) Successful in 11m4s
CI / V2 Module (pull_request) Successful in 11m5s
Add provider-specific usage details, fix streaming usage, and return
usage from all high-level APIs (Chat.Send, Generate[T], Agent.Run).

Breaking changes:
- Chat.Send/SendMessage/SendWithImages now return (string, *Usage, error)
- Generate[T]/GenerateWith[T] now return (T, *Usage, error)
- Agent.Run/RunMessages now return (string, *Usage, error)

New features:
- Usage.Details map for provider-specific token breakdowns
  (reasoning, cached, audio, thoughts tokens)
- OpenAI streaming now captures usage via StreamOptions.IncludeUsage
- Google streaming now captures UsageMetadata from final chunk
- UsageTracker.Details() for accumulated detail totals
- ModelPricing and PricingRegistry for cost computation

Closes #2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 04:33:18 +00:00
steve 7e1705c385 feat: add audio input support to v2 providers
CI / Lint (push) Successful in 9m37s
CI / Root Module (push) Successful in 10m53s
CI / V2 Module (push) Successful in 11m9s
Add Audio struct alongside Image for sending audio attachments to
multimodal LLMs. OpenAI uses input_audio content parts (wav/mp3),
Google Gemini uses genai.NewPartFromBytes, and Anthropic skips
audio gracefully since it's not supported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 21:00:56 -05:00
steve fc2218b5fe Add comprehensive test suite for sandbox package (78 tests)
CI / Lint (push) Successful in 9m35s
CI / V2 Module (push) Successful in 10m39s
CI / Root Module (push) Successful in 11m2s
Expanded from 22 basic tests to 78 tests covering error injection,
task polling, IP discovery, context cancellation, HTTP error codes,
concurrent access, SSH lifecycle, and request verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 01:10:59 -05:00
steve 23c9068022 Add sandbox package for isolated Linux containers via Proxmox LXC
CI / V2 Module (push) Successful in 11m46s
CI / Root Module (push) Successful in 11m50s
CI / Lint (push) Successful in 9m28s
Provides a complete lifecycle manager for ephemeral sandbox environments:
- ProxmoxClient: thin REST wrapper for container CRUD, IP discovery, internet toggle
- SSHExecutor: persistent SSH/SFTP for command execution and file transfer
- Manager/Sandbox: high-level orchestrator tying Proxmox + SSH together
- 22 unit tests with mock Proxmox HTTP server
- Proxmox setup & hardening guide (docs/sandbox-setup.md)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 00:47:45 -05:00
steve 87ec56a2be Add agent sub-package for composable LLM agents
CI / Lint (push) Successful in 9m46s
CI / V2 Module (push) Successful in 12m5s
CI / Root Module (push) Successful in 12m6s
Introduces v2/agent with a minimal API: Agent, New(), Run(), and AsTool().
Agents wrap a model + system prompt + tools. AsTool() turns an agent into
a llm.Tool, enabling parent agents to delegate to sub-agents through the
normal tool-call loop — no channels, pools, or orchestration needed.

Also exports NewClient(provider.Provider) for custom provider integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 23:17:19 -05:00
steve be572a76f4 Add structured output support with Generate[T] and GenerateWith[T]
CI / Lint (push) Successful in 9m35s
CI / V2 Module (push) Successful in 11m43s
CI / Root Module (push) Successful in 11m53s
Generic functions that use the "hidden tool" technique to force models
to return structured JSON matching a Go struct's schema, replacing the
verbose "tool as structured output" pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 22:36:33 -05:00
steve 6a7eeef619 Add comprehensive test suite for v2 module with mock provider
CI / Lint (push) Successful in 9m36s
CI / V2 Module (push) Successful in 11m33s
CI / Root Module (push) Successful in 11m35s
Cover all core library logic (Client, Model, Chat, middleware, streaming,
message conversion, request building) using a configurable mock provider
that avoids real API calls. ~50 tests across 7 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 22:00:49 -05:00
steve 9e288954f2 Add transcription API to v2 module
CI / Lint (push) Failing after 5m0s
CI / Root Module (push) Failing after 5m3s
CI / V2 Module (push) Successful in 10m48s
Migrate speech-to-text transcription types and OpenAI transcriber
implementation from v1. Types are defined in provider/ to avoid
import cycles and re-exported via type aliases from the root package.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 20:24:20 -05:00
steve a4cb4baab5 Add go-llm v2: redesigned API for simpler LLM abstraction
v2 is a new Go module (v2/) with a dramatically simpler API:
- Unified Message type (no more Input marker interface)
- Define[T] for ergonomic tool creation with standard context.Context
- Chat session with automatic tool-call loop (agent loop)
- Streaming via pull-based StreamReader
- MCP one-call connect (MCPStdioServer, MCPHTTPServer, MCPSSEServer)
- Middleware support (logging, retry, timeout, usage tracking)
- Decoupled JSON Schema (map[string]any, no provider coupling)
- Sample tools: WebSearch, Browser, Exec, ReadFile, WriteFile, HTTP
- Providers: OpenAI, Anthropic, Google (all with streaming)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 20:00:08 -05:00