Add 7 new init scripts to cover WebGL fingerprinting, missing Chrome
APIs, permissions behavior, CDP artifacts, and HeadlessChrome UA string.
Enable Chromium's new headless mode (Channel: "chromium") when stealth
is active to use the full UI layer that is harder to detect.
Closes#58
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add anti-bot-detection evasion support to reduce blocking by sites like
archive.ph. Stealth mode is enabled by default for all browsers and applies
common evasions: navigator.webdriver override, plugin/mimeType spoofing,
window.chrome stub, and outerWidth/outerHeight fixes. For Chromium,
--disable-blink-features=AutomationControlled is also added.
New BrowserOptions fields:
- Stealth *bool: toggle stealth presets (default true)
- LaunchArgs []string: custom browser launch arguments
- InitScripts []string: JavaScript injected before page scripts
Closes#56
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDuckGo's weather widget uses randomized CSS module class names that
don't match the BEM-style selectors the extractor was using. Replace all
class-based selectors with structural and attribute-based selectors:
- Identify widget via article:has(img[src*='weatherkit'])
- Use positional selectors (div:first-child, p:first-of-type, etc.)
- Extract icon hints from img[alt] attributes
- Parse precipitation from span > span structure
- Derive CurrentTemp from first hourly entry (no standalone element)
- Derive HighTemp/LowTemp from first daily forecast entry
- Use text-matching for Humidity/Wind labels
Fixes#53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add HourlyForecast struct and Hourly field to WeatherData for hourly
temperature/condition data. Add Precipitation (int, -1 if unavailable)
and IconHint (from aria-label/title/alt attributes) to both DayForecast
and HourlyForecast. This enables downstream consumers like mort to
replace inline DuckDuckGo scraping with a single GetWeather() call.
Closes#51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract firmware information from Bambu Lab's firmware download pages
by parsing the __NEXT_DATA__ JSON blob embedded in the page. Supports
all printer models (X1, P1, A1, A1 mini, H2D, H2S, P2S, X1E, H2D Pro).
Provides GetLatestFirmware() and GetAllFirmware() methods that return
version, release date, release notes, download URL, and MD5 checksum.
Closes#45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sites/recipe package with ExtractRecipe() that works on any recipe
URL. Parses JSON-LD structured data (@type: Recipe) first, with DOM
fallback. Handles @graph containers, arrays, HowToStep objects, ISO
8601 durations, and various author/yield/image formats.
Closes#29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sites/steam package with GetGamePrice() and SearchGames() methods.
Handles regular prices, discounted games, and free-to-play titles.
Includes age gate bypass logic and currency detection.
Closes#28
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sites/coingecko package with GetPrice() method that extracts
structured crypto price data (name, symbol, price, 24h/7d change,
market cap, volume, 24h high/low) from CoinGecko coin pages.
Includes mock-based tests and parseLargeNumber helper for T/B/M suffixes.
Closes#27
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add weather.go with GetWeather() for extracting structured weather data
(location, temp, conditions, forecast) and stock.go with GetStockQuote()
and GetStockChart() for stock data extraction and chart screenshots.
Both include mock-based tests. CSS selectors may need tuning against
the live site since DuckDuckGo's React-rendered widgets use dynamic
class names.
Closes#25, #26
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create exported extractortest package with MockBrowser, MockDocument,
and MockNode that support selector-based responses for testing site
extractors without a real browser.
Add extraction tests for DuckDuckGo (result parsing, empty results, no
links, full search flow) and Powerball (drawing parsing, next drawing
parsing with billion/million, error cases, full GetCurrent flow).
Closes#21
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap staticCookieJar in struct with sync.RWMutex for thread safety
- Add SameSite field to Cookie struct with Strict/Lax/None constants
- Update Playwright cookie conversion functions for SameSite
- Replace hardcoded 4-country switch with dynamic country code generation
Closes#20, #22, #23
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return errors for required fields (ID, price) and log warnings for
optional fields (title, description, unit price) across all site
extractors instead of silently discarding them with _ =.
Closes#24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract identical numericOnly inline functions from powerball and
megamillions into shared sites/internal/parse.NumericOnly with tests
- Extract duplicated DuckDuckGo result parsing from Search() and
GetResults() into shared extractResults() helper
Closes#13, #14
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define DefaultUserAgent (Firefox/147.0) in playwright.go and reference
it from NewBrowser, NewInteractiveBrowser, and CLI flags. Previously
three different UA strings existed (two at 142.0, one outdated at 133.0).
Closes#17
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change ShowBrowser from bool to *bool so nil means "don't override"
in mergeOptions(), fixing the bug where it always overwrote the base
- Add Bool() helper for convenient *bool construction
- Align NewInteractiveBrowser default from Chromium to Firefox to match
NewBrowser
- Update README example and CLI flags for the *bool change
Closes#15, #16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- playwright.go: check error from page.Context().Cookies() before
iterating over results, preventing silent failures
- archive.go: replace time.Sleep(5s) with context-aware select using
time.After, allowing the operation to be cancelled promptly
Closes#7, #18
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix archive cmd passing only archive-specific Flags instead of the
merged flags variable that includes browser flags (#8)
- Move defer DeferClose() after error checks in 6 locations to prevent
calling Close on nil values (#19):
- sites/duckduckgo/cmd/duckduckgo/main.go
- sites/duckduckgo/duckduckgo.go
- sites/google/cmd/google/main.go
- sites/wegmans/cmd/wegmans/main.go
- sites/wegmans/wegmans.go
- sites/aislegopher/aislegopher.go
Closes#8, #19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change SearchPage.GetResults() to return ([]Result, error) so ForEach
errors are no longer silently discarded
- Fix Search() to return the ForEach error instead of nil
- Update cmd caller to check GetResults() errors
Closes#5, #6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add length check before slicing article.Content[:32], matching the
safe truncation pattern already used in cmd/browser/main.go.
Closes#9
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- document.go: check if resp is nil before calling resp.Status() in
Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
Type() and Click(), preventing panics when DOM elements are missing
Closes#10, #11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace string interpolation in SetAttribute with Playwright's Evaluate
argument passing mechanism. This structurally eliminates the injection
surface — arbitrary name/value strings are safely passed as JavaScript
arguments rather than interpolated into the expression string.
The vulnerable escapeJavaScript helper (which only escaped \ and ') is
removed since it is no longer needed.
Closes#12
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add project documentation:
- README.md with installation, usage examples, API reference, and project structure
- CLAUDE.md with developer guide, architecture overview, conventions, and issue label docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The go.sum file was not tracked, causing CI to fail with
"missing go.sum entry" errors during build/test/vet.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gitea.com/actions/setup-go mirror only has tags up to v3.
v3 supports go-version-file which is all we need.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub is returning 500 errors for actions/checkout and actions/setup-go.
Switch to Gitea's own mirrors at gitea.com/actions/ to avoid the dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>