Add sites/recipe package with ExtractRecipe() that works on any recipe
URL. Parses JSON-LD structured data (@type: Recipe) first, with DOM
fallback. Handles @graph containers, arrays, HowToStep objects, ISO
8601 durations, and various author/yield/image formats.
Closes#29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sites/steam package with GetGamePrice() and SearchGames() methods.
Handles regular prices, discounted games, and free-to-play titles.
Includes age gate bypass logic and currency detection.
Closes#28
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sites/coingecko package with GetPrice() method that extracts
structured crypto price data (name, symbol, price, 24h/7d change,
market cap, volume, 24h high/low) from CoinGecko coin pages.
Includes mock-based tests and parseLargeNumber helper for T/B/M suffixes.
Closes#27
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add weather.go with GetWeather() for extracting structured weather data
(location, temp, conditions, forecast) and stock.go with GetStockQuote()
and GetStockChart() for stock data extraction and chart screenshots.
Both include mock-based tests. CSS selectors may need tuning against
the live site since DuckDuckGo's React-rendered widgets use dynamic
class names.
Closes#25, #26
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create exported extractortest package with MockBrowser, MockDocument,
and MockNode that support selector-based responses for testing site
extractors without a real browser.
Add extraction tests for DuckDuckGo (result parsing, empty results, no
links, full search flow) and Powerball (drawing parsing, next drawing
parsing with billion/million, error cases, full GetCurrent flow).
Closes#21
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap staticCookieJar in struct with sync.RWMutex for thread safety
- Add SameSite field to Cookie struct with Strict/Lax/None constants
- Update Playwright cookie conversion functions for SameSite
- Replace hardcoded 4-country switch with dynamic country code generation
Closes#20, #22, #23
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return errors for required fields (ID, price) and log warnings for
optional fields (title, description, unit price) across all site
extractors instead of silently discarding them with _ =.
Closes#24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract identical numericOnly inline functions from powerball and
megamillions into shared sites/internal/parse.NumericOnly with tests
- Extract duplicated DuckDuckGo result parsing from Search() and
GetResults() into shared extractResults() helper
Closes#13, #14
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define DefaultUserAgent (Firefox/147.0) in playwright.go and reference
it from NewBrowser, NewInteractiveBrowser, and CLI flags. Previously
three different UA strings existed (two at 142.0, one outdated at 133.0).
Closes#17
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change ShowBrowser from bool to *bool so nil means "don't override"
in mergeOptions(), fixing the bug where it always overwrote the base
- Add Bool() helper for convenient *bool construction
- Align NewInteractiveBrowser default from Chromium to Firefox to match
NewBrowser
- Update README example and CLI flags for the *bool change
Closes#15, #16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- playwright.go: check error from page.Context().Cookies() before
iterating over results, preventing silent failures
- archive.go: replace time.Sleep(5s) with context-aware select using
time.After, allowing the operation to be cancelled promptly
Closes#7, #18
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix archive cmd passing only archive-specific Flags instead of the
merged flags variable that includes browser flags (#8)
- Move defer DeferClose() after error checks in 6 locations to prevent
calling Close on nil values (#19):
- sites/duckduckgo/cmd/duckduckgo/main.go
- sites/duckduckgo/duckduckgo.go
- sites/google/cmd/google/main.go
- sites/wegmans/cmd/wegmans/main.go
- sites/wegmans/wegmans.go
- sites/aislegopher/aislegopher.go
Closes#8, #19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change SearchPage.GetResults() to return ([]Result, error) so ForEach
errors are no longer silently discarded
- Fix Search() to return the ForEach error instead of nil
- Update cmd caller to check GetResults() errors
Closes#5, #6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add length check before slicing article.Content[:32], matching the
safe truncation pattern already used in cmd/browser/main.go.
Closes#9
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- document.go: check if resp is nil before calling resp.Status() in
Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
Type() and Click(), preventing panics when DOM elements are missing
Closes#10, #11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace string interpolation in SetAttribute with Playwright's Evaluate
argument passing mechanism. This structurally eliminates the injection
surface — arbitrary name/value strings are safely passed as JavaScript
arguments rather than interpolated into the expression string.
The vulnerable escapeJavaScript helper (which only escaped \ and ') is
removed since it is no longer needed.
Closes#12
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add project documentation:
- README.md with installation, usage examples, API reference, and project structure
- CLAUDE.md with developer guide, architecture overview, conventions, and issue label docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The go.sum file was not tracked, causing CI to fail with
"missing go.sum entry" errors during build/test/vet.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gitea.com/actions/setup-go mirror only has tags up to v3.
v3 supports go-version-file which is all we need.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub is returning 500 errors for actions/checkout and actions/setup-go.
Switch to Gitea's own mirrors at gitea.com/actions/ to avoid the dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exposes Playwright's Keyboard.InsertText() which dispatches only an
input event (no keydown/keyup). This is essential for pasting text
into password fields and custom input components that don't handle
rapid-fire synthetic key events from Type().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exposes low-level mouse, keyboard, screenshot, navigation, and cookie
extraction APIs via a new InteractiveBrowser interface. Designed for
interactive browser proxy sessions where direct page control is needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactored Playwright initialization to ensure context propagation. Updated `NewPlayWrightBrowser` and related methods to accept `context.Context` for better cancellation and timeout handling. Improved error resilience and concurrency during browser setup.
Refined price parsing logic to strip trailing periods from units (e.g., "lb." -> "lb") for better handling. Added logging for debugging extracted response data.
Introduced the `UseLocalOnly` option to prevent connections to a remote Playwright server and enforce usage of the local server. Updated relevant connection logic to respect this new option.