feat(v2): add ReasoningLevel option; thinking/reasoning across providers

Introduces an opt-in level-based reasoning toggle (low/medium/high) that each provider translates to its native parameter: - Anthropic: thinking.budget_tokens (1024/8000/24000), with temperature forced to default and MaxTokens auto-grown above the budget. - OpenAI/xAI/Groq via openaicompat: reasoning_effort string, gated by a new Rules.SupportsReasoning predicate so non-reasoning models don't receive the parameter. xAI uses Rules.MapReasoningEffort to remap "medium" to "high" since its API only accepts low|high. - Google: thinking_config.thinking_budget + include_thoughts:true. - DeepSeek: SupportsReasoning=false (reasoner is always-on; the reasoning_content trace was already extracted via openaicompat). Reasoning content is surfaced as Response.Thinking on Complete and as StreamEventThinking deltas during streaming. Provider-side: extracted from Anthropic thinking content blocks, Google's part.Thought=true parts, and the non-standard reasoning_content field that DeepSeek and Groq emit (parsed out of raw JSON since openai-go doesn't type it). Public API: - llm.ReasoningLevel + ReasoningLow/Medium/High constants - llm.WithReasoning(level) request option - Model.WithReasoning(level) for baked-in defaults - provider.Request.Reasoning, provider.Response.Thinking - provider.StreamEventThinking Tests cover Rules-based gating, MapReasoningEffort, reasoning_content extraction (Complete + Stream), Anthropic budget mapping, and temperature suppression when thinking is enabled. Existing behavior is unchanged when Reasoning is the empty string. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:58:42 +00:00
parent 34119e5a00
commit cbaf41f50c
16 changed files with 602 additions and 32 deletions
@@ -10,8 +10,32 @@ type requestConfig struct {
 	topP        *float64
 	stop        []string
 	cacheConfig *cacheConfig
+	reasoning   ReasoningLevel
 }

+// ReasoningLevel selects how much reasoning effort/budget the provider should
+// spend before answering. Empty string is the default (no reasoning, identical
+// to historical behavior). Each provider translates this to its native
+// parameter; models that don't support reasoning silently ignore it.
+type ReasoningLevel string
+
+const (
+	// ReasoningLow asks for a small amount of extra reasoning. Maps to
+	// reasoning_effort="low" on OpenAI/xAI, ~1k thinking budget on
+	// Anthropic/Google.
+	ReasoningLow ReasoningLevel = "low"
+
+	// ReasoningMedium asks for a moderate amount. Maps to reasoning_effort
+	// ="medium" on OpenAI, ~8k thinking budget on Anthropic/Google. xAI
+	// remaps medium to its only-other-option, "high".
+	ReasoningMedium ReasoningLevel = "medium"
+
+	// ReasoningHigh asks for the most reasoning the provider exposes.
+	// Maps to reasoning_effort="high" on OpenAI/xAI, ~24k thinking budget
+	// on Anthropic/Google.
+	ReasoningHigh ReasoningLevel = "high"
+)
+
 // cacheConfig holds prompt-caching settings. nil = disabled.
 type cacheConfig struct {
 	enabled bool
@@ -42,6 +66,21 @@ func WithStop(sequences ...string) RequestOption {
 	return func(c *requestConfig) { c.stop = sequences }
 }

+// WithReasoning asks the model to spend extra reasoning budget on the
+// response. Each provider maps the level to its native shape:
+//
+//   - Anthropic: thinking.budget_tokens (low ~ 1024, medium ~ 8000, high ~ 24000)
+//   - OpenAI / xAI / Groq: reasoning_effort string (xAI remaps medium to high)
+//   - Google: thinking_config.thinking_budget (same budget as Anthropic)
+//   - DeepSeek (reasoner): always-on regardless; this option is a no-op
+//   - Models without reasoning support: silently ignored
+//
+// Reasoning content (when surfaced by the provider) appears on
+// Response.Thinking, and is also streamed as StreamEventThinking events.
+func WithReasoning(level ReasoningLevel) RequestOption {
+	return func(c *requestConfig) { c.reasoning = level }
+}
+
 // WithPromptCaching enables automatic prompt-caching markers on providers
 // that support it (currently Anthropic). On providers that don't support
 // explicit cache markers (OpenAI, Google), this is a no-op.