The weather extractor used positional CSS selectors (div:first-child,
div:nth-child(2)) to locate the header and hourly container within the
widget section. When DuckDuckGo inserts advisory banners (e.g. wind
advisory), the extra div shifts positions and breaks extraction of
current temp, hourly data, humidity, and wind.
Replace with structural selectors:
- div:not(:has(ul)) for the header (first div without a list)
- div:has(> ul) for the hourly container (div with direct ul child)
These match elements by their content structure rather than position,
so advisory banners no longer break extraction.
Fixes#64
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDuckGo's weather widget uses randomized CSS module class names that
don't match the BEM-style selectors the extractor was using. Replace all
class-based selectors with structural and attribute-based selectors:
- Identify widget via article:has(img[src*='weatherkit'])
- Use positional selectors (div:first-child, p:first-of-type, etc.)
- Extract icon hints from img[alt] attributes
- Parse precipitation from span > span structure
- Derive CurrentTemp from first hourly entry (no standalone element)
- Derive HighTemp/LowTemp from first daily forecast entry
- Use text-matching for Humidity/Wind labels
Fixes#53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add HourlyForecast struct and Hourly field to WeatherData for hourly
temperature/condition data. Add Precipitation (int, -1 if unavailable)
and IconHint (from aria-label/title/alt attributes) to both DayForecast
and HourlyForecast. This enables downstream consumers like mort to
replace inline DuckDuckGo scraping with a single GetWeather() call.
Closes#51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add weather.go with GetWeather() for extracting structured weather data
(location, temp, conditions, forecast) and stock.go with GetStockQuote()
and GetStockChart() for stock data extraction and chart screenshots.
Both include mock-based tests. CSS selectors may need tuning against
the live site since DuckDuckGo's React-rendered widgets use dynamic
class names.
Closes#25, #26
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create exported extractortest package with MockBrowser, MockDocument,
and MockNode that support selector-based responses for testing site
extractors without a real browser.
Add extraction tests for DuckDuckGo (result parsing, empty results, no
links, full search flow) and Powerball (drawing parsing, next drawing
parsing with billion/million, error cases, full GetCurrent flow).
Closes#21
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return errors for required fields (ID, price) and log warnings for
optional fields (title, description, unit price) across all site
extractors instead of silently discarding them with _ =.
Closes#24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract identical numericOnly inline functions from powerball and
megamillions into shared sites/internal/parse.NumericOnly with tests
- Extract duplicated DuckDuckGo result parsing from Search() and
GetResults() into shared extractResults() helper
Closes#13, #14
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix archive cmd passing only archive-specific Flags instead of the
merged flags variable that includes browser flags (#8)
- Move defer DeferClose() after error checks in 6 locations to prevent
calling Close on nil values (#19):
- sites/duckduckgo/cmd/duckduckgo/main.go
- sites/duckduckgo/duckduckgo.go
- sites/google/cmd/google/main.go
- sites/wegmans/cmd/wegmans/main.go
- sites/wegmans/wegmans.go
- sites/aislegopher/aislegopher.go
Closes#8, #19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change SearchPage.GetResults() to return ([]Result, error) so ForEach
errors are no longer silently discarded
- Fix Search() to return the ForEach error instead of nil
- Update cmd caller to check GetResults() errors
Closes#5, #6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduced the `OpenSearch` method and `SearchPage` interface to streamline search operations and allow for loading more results dynamically. Updated dependencies and modified the DuckDuckGo CLI to utilize these enhancements.
Introduced a new package and command for extracting data from aislegopher.com, including URL parsing and item retrieval. Updated dependencies in go.mod to support the new functionality. Additionally, refined import structure in the DuckDuckGo integration.
Replaced the overly complex CSS selector with a simplified "h2" selector for extracting titles. This change improves maintainability and ensures accurate title extraction from the updated DOM structure.
Implemented a DuckDuckGo search module with configurable SafeSearch and regional settings. Added a CLI tool to perform searches via DuckDuckGo using browser automation, supporting flags for customization.