feat(v2): add ReasoningLevel option; thinking/reasoning across providers
Introduces an opt-in level-based reasoning toggle (low/medium/high) that each provider translates to its native parameter: - Anthropic: thinking.budget_tokens (1024/8000/24000), with temperature forced to default and MaxTokens auto-grown above the budget. - OpenAI/xAI/Groq via openaicompat: reasoning_effort string, gated by a new Rules.SupportsReasoning predicate so non-reasoning models don't receive the parameter. xAI uses Rules.MapReasoningEffort to remap "medium" to "high" since its API only accepts low|high. - Google: thinking_config.thinking_budget + include_thoughts:true. - DeepSeek: SupportsReasoning=false (reasoner is always-on; the reasoning_content trace was already extracted via openaicompat). Reasoning content is surfaced as Response.Thinking on Complete and as StreamEventThinking deltas during streaming. Provider-side: extracted from Anthropic thinking content blocks, Google's part.Thought=true parts, and the non-standard reasoning_content field that DeepSeek and Groq emit (parsed out of raw JSON since openai-go doesn't type it). Public API: - llm.ReasoningLevel + ReasoningLow/Medium/High constants - llm.WithReasoning(level) request option - Model.WithReasoning(level) for baked-in defaults - provider.Request.Reasoning, provider.Response.Thinking - provider.StreamEventThinking Tests cover Rules-based gating, MapReasoningEffort, reasoning_content extraction (Complete + Stream), Anthropic budget mapping, and temperature suppression when thinking is enabled. Existing behavior is unchanged when Reasoning is the empty string. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -80,6 +80,15 @@ type Request struct {
|
||||
// CacheHints requests prompt-cache breakpoints at specified positions
|
||||
// on providers that support it (currently Anthropic). nil = no caching.
|
||||
CacheHints *CacheHints
|
||||
|
||||
// Reasoning, when non-empty, asks the model to spend extra inference
|
||||
// budget reasoning before answering. Each provider translates this to
|
||||
// its native parameter (Anthropic thinking.budget_tokens, OpenAI/xAI
|
||||
// reasoning_effort, Google thinking_config, etc.). Models that do not
|
||||
// support reasoning silently ignore it.
|
||||
//
|
||||
// Allowed values: "" (no reasoning, default), "low", "medium", "high".
|
||||
Reasoning string
|
||||
}
|
||||
|
||||
// Response is a completion response at the provider level.
|
||||
@@ -87,6 +96,11 @@ type Response struct {
|
||||
Text string
|
||||
ToolCalls []ToolCall
|
||||
Usage *Usage
|
||||
|
||||
// Thinking holds the model's reasoning/thinking trace, when one was
|
||||
// requested and the provider exposed it. Empty for providers/models
|
||||
// that do not surface a thinking trace.
|
||||
Thinking string
|
||||
}
|
||||
|
||||
// Usage captures token consumption.
|
||||
@@ -117,6 +131,7 @@ const (
|
||||
StreamEventToolEnd // Tool call complete
|
||||
StreamEventDone // Stream complete
|
||||
StreamEventError // Error occurred
|
||||
StreamEventThinking // Reasoning/thinking content delta
|
||||
)
|
||||
|
||||
// StreamEvent represents a single event in a streaming response.
|
||||
|
||||
Reference in New Issue
Block a user