Commit Graph

7 Commits

Author SHA1 Message Date
steve e0da88b9b0 feat: add PromoteToInteractive and DemoteToDocument for mid-session page transfer
CI / build (pull_request) Successful in 29s
CI / test (pull_request) Successful in 36s
CI / vet (pull_request) Failing after 6m18s
Allow transferring ownership of a Playwright page between Document and
InteractiveBrowser modes without tearing down the browser. This enables
handing a live page to a human (e.g. for captcha solving) and resuming
scraping on the same page afterward.

Closes #76

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:27:42 +00:00
steve 65cf6b027f feat: add RemoveHidden option to strip display:none elements before extraction
CI / vet (pull_request) Successful in 34s
CI / test (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 1m5s
When RemoveHidden is true, JavaScript is evaluated on the live page to
remove all elements with computed display:none before readability
extraction. This defends against anti-scraping honeypots that embed
prompt injections in hidden DOM elements.

The implementation uses an optional pageEvaluator interface so that the
concrete document (backed by Playwright) supports it while the Document
interface remains unchanged.

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 14:06:17 +00:00
steve 6c68062e56 fix: add nil guards to prevent nil-pointer panics
CI / test (pull_request) Successful in 46s
CI / build (pull_request) Successful in 47s
CI / vet (pull_request) Successful in 59s
- document.go: check if resp is nil before calling resp.Status() in
  Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
  Type() and Click(), preventing panics when DOM elements are missing

Closes #10, #11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:13:43 +00:00
steve cb2ed10cfd refactor: restructure API, deduplicate code, expand test coverage
CI / build (push) Failing after 2m4s
CI / test (push) Failing after 2m6s
CI / vet (push) Failing after 2m19s
- Extract shared DeferClose helper, removing 14 duplicate copies
- Rename PlayWright-prefixed types to cleaner names (BrowserOptions,
  BrowserSelection, NewBrowser, etc.)
- Rename fields: ServerAddress, RequireServer (was DontLaunchOnConnectFailure)
- Extract shared initBrowser/mergeOptions into browser_init.go,
  deduplicating ~120 lines between NewBrowser and NewInteractiveBrowser
- Remove unused locator field from document struct
- Add tests for all previously untested packages (archive, aislegopher,
  wegmans, useragents, powerball) and expand existing test suites
- Add MIGRATION.md documenting all breaking API changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 13:59:47 -05:00
steve e7b7e78796 fix: bug fixes, test coverage, and CI workflow
CI / vet (push) Failing after 15s
CI / build (push) Failing after 30s
CI / test (push) Failing after 36s
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:14:19 -05:00
steve 567a9f9212 added archive, megamillions, and powerball site logic 2024-12-23 03:18:50 -05:00
steve 5e924eb3f9 changed browser api to return pages that can be acted on, not strictly contents 2024-12-17 23:16:13 -05:00