Commit Graph

41 Commits

Author SHA1 Message Date
c1c1acdb00 feature: add PizzINT (Pentagon Pizza Index) site extractor
All checks were successful
CI / vet (pull_request) Successful in 1m7s
CI / build (pull_request) Successful in 1m9s
CI / test (pull_request) Successful in 1m9s
Adds a new site extractor for pizzint.watch, which tracks pizza shop
activity near the Pentagon as an OSINT indicator. The extractor fetches
the dashboard API and exposes DOUGHCON levels, restaurant activity, and
spike events.

Includes a CLI tool with an HTTP server mode (--serve) for embedding
the pizza status in dashboards or status displays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 05:45:55 +00:00
a32f57ec92 fix: update weather extractor selectors to match DuckDuckGo's actual DOM
All checks were successful
CI / build (pull_request) Successful in 30s
CI / vet (pull_request) Successful in 45s
CI / test (pull_request) Successful in 48s
DuckDuckGo's weather widget uses randomized CSS module class names that
don't match the BEM-style selectors the extractor was using. Replace all
class-based selectors with structural and attribute-based selectors:

- Identify widget via article:has(img[src*='weatherkit'])
- Use positional selectors (div:first-child, p:first-of-type, etc.)
- Extract icon hints from img[alt] attributes
- Parse precipitation from span > span structure
- Derive CurrentTemp from first hourly entry (no standalone element)
- Derive HighTemp/LowTemp from first daily forecast entry
- Use text-matching for Humidity/Wind labels

Fixes #53

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:00:44 +00:00
469171da9c feature: add hourly forecast, precipitation, and icon hints to weather extractor
All checks were successful
CI / build (pull_request) Successful in 30s
CI / vet (pull_request) Successful in 46s
CI / test (pull_request) Successful in 49s
Add HourlyForecast struct and Hourly field to WeatherData for hourly
temperature/condition data. Add Precipitation (int, -1 if unavailable)
and IconHint (from aria-label/title/alt attributes) to both DayForecast
and HourlyForecast. This enables downstream consumers like mort to
replace inline DuckDuckGo scraping with a single GetWeather() call.

Closes #51

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:22:04 +00:00
df934a0521 feature: add Bambu Lab firmware version extractor
All checks were successful
CI / build (pull_request) Successful in 30s
CI / vet (pull_request) Successful in 48s
CI / test (pull_request) Successful in 49s
Extract firmware information from Bambu Lab's firmware download pages
by parsing the __NEXT_DATA__ JSON blob embedded in the page. Supports
all printer models (X1, P1, A1, A1 mini, H2D, H2S, P2S, X1E, H2D Pro).

Provides GetLatestFirmware() and GetAllFirmware() methods that return
version, release date, release notes, download URL, and MD5 checksum.

Closes #45

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 20:21:06 +00:00
c2768e2b05 feature: add IMDB movie/TV extractor
All checks were successful
CI / test (pull_request) Successful in 46s
CI / vet (pull_request) Successful in 47s
CI / build (pull_request) Successful in 1m18s
Add sites/imdb package with GetMovie() and Search() methods. Extracts
title, year, rating, votes, runtime, genres, director, cast, plot,
poster, and box office data. Uses JSON-LD parsing with DOM fallback.
Supports Movie, TVSeries, and TVMiniSeries types.

Closes #30

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:54:30 +00:00
de0a065923 feature: add recipe extractor with JSON-LD and DOM parsing
All checks were successful
CI / build (pull_request) Successful in 57s
CI / vet (pull_request) Successful in 1m2s
CI / test (pull_request) Successful in 1m5s
Add sites/recipe package with ExtractRecipe() that works on any recipe
URL. Parses JSON-LD structured data (@type: Recipe) first, with DOM
fallback. Handles @graph containers, arrays, HowToStep objects, ISO
8601 durations, and various author/yield/image formats.

Closes #29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:52:28 +00:00
b1137f2ebc feature: add Steam Store game price extractor
All checks were successful
CI / vet (pull_request) Successful in 1m24s
CI / build (pull_request) Successful in 1m24s
CI / test (pull_request) Successful in 1m28s
Add sites/steam package with GetGamePrice() and SearchGames() methods.
Handles regular prices, discounted games, and free-to-play titles.
Includes age gate bypass logic and currency detection.

Closes #28

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:50:27 +00:00
349b1b9c6b feature: add CoinGecko cryptocurrency price extractor
All checks were successful
CI / build (pull_request) Successful in 46s
CI / vet (pull_request) Successful in 1m20s
CI / test (pull_request) Successful in 1m23s
Add sites/coingecko package with GetPrice() method that extracts
structured crypto price data (name, symbol, price, 24h/7d change,
market cap, volume, 24h high/low) from CoinGecko coin pages.

Includes mock-based tests and parseLargeNumber helper for T/B/M suffixes.

Closes #27

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:47:53 +00:00
461b704792 feature: add DuckDuckGo weather and stock widget extractors
All checks were successful
CI / vet (pull_request) Successful in 29s
CI / build (pull_request) Successful in 46s
CI / test (pull_request) Successful in 48s
Add weather.go with GetWeather() for extracting structured weather data
(location, temp, conditions, forecast) and stock.go with GetStockQuote()
and GetStockChart() for stock data extraction and chart screenshots.

Both include mock-based tests. CSS selectors may need tuning against
the live site since DuckDuckGo's React-rendered widgets use dynamic
class names.

Closes #25, #26
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:40:53 +00:00
198906946b test: add mock-based site extractor test infrastructure
All checks were successful
CI / vet (pull_request) Successful in 1m5s
CI / build (pull_request) Successful in 1m6s
CI / test (pull_request) Successful in 1m6s
Create exported extractortest package with MockBrowser, MockDocument,
and MockNode that support selector-based responses for testing site
extractors without a real browser.

Add extraction tests for DuckDuckGo (result parsing, empty results, no
links, full search flow) and Powerball (drawing parsing, next drawing
parsing with billion/million, error cases, full GetCurrent flow).

Closes #21
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:37:58 +00:00
963696cd62 enhance: thread-safe CookieJar, SameSite cookie attr, dynamic Google countries
All checks were successful
CI / vet (pull_request) Successful in 40s
CI / build (pull_request) Successful in 1m22s
CI / test (pull_request) Successful in 1m28s
- Wrap staticCookieJar in struct with sync.RWMutex for thread safety
- Add SameSite field to Cookie struct with Strict/Lax/None constants
- Update Playwright cookie conversion functions for SameSite
- Replace hardcoded 4-country switch with dynamic country code generation

Closes #20, #22, #23
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:34:54 +00:00
a9711ce904 fix: surface parsing errors instead of silently discarding them
All checks were successful
CI / vet (pull_request) Successful in 1m10s
CI / build (pull_request) Successful in 1m21s
CI / test (pull_request) Successful in 1m28s
Return errors for required fields (ID, price) and log warnings for
optional fields (title, description, unit price) across all site
extractors instead of silently discarding them with _ =.

Closes #24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:31:56 +00:00
132817144e refactor: deduplicate numericOnly and DuckDuckGo result extraction
All checks were successful
CI / build (pull_request) Successful in 29s
CI / vet (pull_request) Successful in 1m1s
CI / test (pull_request) Successful in 1m4s
- Extract identical numericOnly inline functions from powerball and
  megamillions into shared sites/internal/parse.NumericOnly with tests
- Extract duplicated DuckDuckGo result parsing from Search() and
  GetResults() into shared extractResults() helper

Closes #13, #14

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:26:54 +00:00
769b870a17 fix: check Cookies() error and use context-aware sleep
All checks were successful
CI / build (pull_request) Successful in 46s
CI / vet (pull_request) Successful in 47s
CI / test (pull_request) Successful in 1m22s
- playwright.go: check error from page.Context().Cookies() before
  iterating over results, preventing silent failures
- archive.go: replace time.Sleep(5s) with context-aware select using
  time.After, allowing the operation to be cancelled promptly

Closes #7, #18

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:19:49 +00:00
8b136b9dda Merge pull request 'Fix cmd flags and defer-before-error-check (#8, #19)' (#36) from fix/cmd-flags-and-defer-ordering into main
All checks were successful
CI / vet (push) Successful in 37s
CI / test (push) Successful in 51s
CI / build (push) Successful in 52s
2026-02-15 16:18:54 +00:00
fca50a47c3 Merge pull request 'Fix DuckDuckGo error handling (#5, #6)' (#35) from fix/duckduckgo-error-handling into main
Some checks failed
CI / build (push) Has been cancelled
CI / test (push) Has been cancelled
CI / vet (push) Has been cancelled
2026-02-15 16:18:50 +00:00
991c43d020 Merge pull request 'Fix archive cmd panic on short content (#9)' (#34) from fix/archive-cmd-short-content into main
Some checks failed
CI / test (push) Has been cancelled
CI / build (push) Has been cancelled
CI / vet (push) Has been cancelled
2026-02-15 16:18:46 +00:00
e5e0db85e8 fix: use merged flags in archive cmd and move defer after error checks
All checks were successful
CI / vet (pull_request) Successful in 29s
CI / build (pull_request) Successful in 32s
CI / test (pull_request) Successful in 57s
- Fix archive cmd passing only archive-specific Flags instead of the
  merged flags variable that includes browser flags (#8)
- Move defer DeferClose() after error checks in 6 locations to prevent
  calling Close on nil values (#19):
  - sites/duckduckgo/cmd/duckduckgo/main.go
  - sites/duckduckgo/duckduckgo.go
  - sites/google/cmd/google/main.go
  - sites/wegmans/cmd/wegmans/main.go
  - sites/wegmans/wegmans.go
  - sites/aislegopher/aislegopher.go

Closes #8, #19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:17:38 +00:00
a12c9f7cb6 fix: propagate errors from DuckDuckGo search and GetResults
Some checks failed
CI / test (pull_request) Failing after 6m12s
CI / vet (pull_request) Failing after 6m12s
CI / build (pull_request) Failing after 6m15s
- Change SearchPage.GetResults() to return ([]Result, error) so ForEach
  errors are no longer silently discarded
- Fix Search() to return the ForEach error instead of nil
- Update cmd caller to check GetResults() errors

Closes #5, #6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:16:04 +00:00
b4e462a6b4 fix: prevent panic on short article content in archive cmd
All checks were successful
CI / vet (pull_request) Successful in 1m6s
CI / build (pull_request) Successful in 1m7s
CI / test (pull_request) Successful in 1m8s
Add length check before slicing article.Content[:32], matching the
safe truncation pattern already used in cmd/browser/main.go.

Closes #9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:14:32 +00:00
6c68062e56 fix: add nil guards to prevent nil-pointer panics
All checks were successful
CI / test (pull_request) Successful in 46s
CI / build (pull_request) Successful in 47s
CI / vet (pull_request) Successful in 59s
- document.go: check if resp is nil before calling resp.Status() in
  Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
  Type() and Click(), preventing panics when DOM elements are missing

Closes #10, #11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:13:43 +00:00
cb2ed10cfd refactor: restructure API, deduplicate code, expand test coverage
Some checks failed
CI / build (push) Failing after 2m4s
CI / test (push) Failing after 2m6s
CI / vet (push) Failing after 2m19s
- Extract shared DeferClose helper, removing 14 duplicate copies
- Rename PlayWright-prefixed types to cleaner names (BrowserOptions,
  BrowserSelection, NewBrowser, etc.)
- Rename fields: ServerAddress, RequireServer (was DontLaunchOnConnectFailure)
- Extract shared initBrowser/mergeOptions into browser_init.go,
  deduplicating ~120 lines between NewBrowser and NewInteractiveBrowser
- Remove unused locator field from document struct
- Add tests for all previously untested packages (archive, aislegopher,
  wegmans, useragents, powerball) and expand existing test suites
- Add MIGRATION.md documenting all breaking API changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 13:59:47 -05:00
e7b7e78796 fix: bug fixes, test coverage, and CI workflow
Some checks failed
CI / vet (push) Failing after 15s
CI / build (push) Failing after 30s
CI / test (push) Failing after 36s
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:14:19 -05:00
82fce5a200 Handle unit suffix in price parsing and add logging
Refined price parsing logic to strip trailing periods from units (e.g., "lb." -> "lb") for better handling. Added logging for debugging extracted response data.
2025-10-20 22:36:20 -04:00
afa0238758 Restrict Price assignment to unit price with "lb" only 2025-10-11 23:48:09 -04:00
9ae8619f93 Enhance price parsing to handle non-zero unit price
Updated price extraction logic to set `Price` from `UnitPrice` when it is non-zero, ensuring more accurate parsing.
2025-10-11 23:34:41 -04:00
9947cae947 Refine selectors and enhance price parsing with logging
Adjusted HTML selectors for improved compatibility and updated price parsing logic to handle additional formats. Added logging to provide better debugging insights during price extraction.
2025-10-10 14:42:06 -04:00
Steve Dudenhoeffer
dc43d1626a Parse drawing date from Powerball numbers page 2025-09-16 11:17:04 -04:00
Steve Dudenhoeffer
2d60940001 Refactored jackpot handling and updated dependencies
Replaced `currency.Amount` with `int` for jackpot values to simplify representation. Adjusted parsing logic accordingly. Updated Go version to 1.24.0 and refreshed dependencies in go.mod for compatibility.
2025-09-16 10:52:49 -04:00
39453288ce Add OpenSearch and SearchPage functionality for DuckDuckGo
Introduced the `OpenSearch` method and `SearchPage` interface to streamline search operations and allow for loading more results dynamically. Updated dependencies and modified the DuckDuckGo CLI to utilize these enhancements.
2025-03-18 02:42:50 -04:00
964a98a5a8 Handle commands without automatic reaction responses
Introduce `ErrCommandNoReactions` to allow commands to opt out of success reactions. Adjust bot behavior to respect this error and prevent reactions when applicable, ensuring cleaner and more controlled responses. Add error handling and safeguard workers against panics.
2025-01-22 21:06:07 -05:00
81ea656332 Add unit price and unit parsing for items
This update enhances the `Item` structure to include `UnitPrice` and `Unit` fields. Additional logic is implemented to extract and parse unit pricing details from the HTML, improving data accuracy and granularity.
2025-01-21 19:42:25 -05:00
6de455b1bd Add price extraction and validate URL structure in parsers
Added price field to Item struct in AisleGopher and implemented logic to extract price data. Updated Wegmans parser to validate URL structure by ensuring the second segment is "product". These changes improve data accuracy and error handling.
2025-01-20 13:00:59 -05:00
f37e60dddc Add Wegmans module to fetch item details and prices
Introduce functionality to retrieve item details, including name and price, from Wegmans using a browser-based scraper. This includes a CLI tool to execute searches and robust error handling for URL validation and browser interactions.
2025-01-20 12:28:29 -05:00
Steve Dudenhoeffer
654976de82 Add AisleGopher integration for data extraction
Introduced a new package and command for extracting data from aislegopher.com, including URL parsing and item retrieval. Updated dependencies in go.mod to support the new functionality. Additionally, refined import structure in the DuckDuckGo integration.
2025-01-20 02:16:32 -05:00
e8de488d2b Update CSS selector for extracting titles in DuckDuckGo parser
Replaced the overly complex CSS selector with a simplified "h2" selector for extracting titles. This change improves maintainability and ensures accurate title extraction from the updated DOM structure.
2025-01-16 21:37:38 -05:00
67a3552747 Add DuckDuckGo integration for search functionality
Implemented a DuckDuckGo search module with configurable SafeSearch and regional settings. Added a CLI tool to perform searches via DuckDuckGo using browser automation, supporting flags for customization.
2025-01-16 20:45:37 -05:00
eec94ec708 Reorder imports in main.go for better organization.
Moved the local package import to align with standard Go import grouping conventions. This improves code readability and maintains a consistent structure.
2025-01-16 20:45:23 -05:00
691ae400d1 Add Google search integration with CLI support
Introduce a Google search integration, including a Go package for performing searches with configurable parameters (e.g., language, region) and a CLI tool for executing search queries. Refactor archive CLI import ordering for consistency.
2025-01-16 16:56:05 -05:00
36707dec17 added useragents to go-extractor 2024-12-24 12:15:48 -05:00
567a9f9212 added archive, megamillions, and powerball site logic 2024-12-23 03:18:50 -05:00