Commit Graph

7 Commits

Author SHA1 Message Date
e0da88b9b0 feat: add PromoteToInteractive and DemoteToDocument for mid-session page transfer
Some checks failed
CI / build (pull_request) Successful in 29s
CI / test (pull_request) Successful in 36s
CI / vet (pull_request) Failing after 6m18s
Allow transferring ownership of a Playwright page between Document and
InteractiveBrowser modes without tearing down the browser. This enables
handing a live page to a human (e.g. for captcha solving) and resuming
scraping on the same page afterward.

Closes #76

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:27:42 +00:00
65cf6b027f feat: add RemoveHidden option to strip display:none elements before extraction
All checks were successful
CI / vet (pull_request) Successful in 34s
CI / test (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 1m5s
When RemoveHidden is true, JavaScript is evaluated on the live page to
remove all elements with computed display:none before readability
extraction. This defends against anti-scraping honeypots that embed
prompt injections in hidden DOM elements.

The implementation uses an optional pageEvaluator interface so that the
concrete document (backed by Playwright) supports it while the Document
interface remains unchanged.

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 14:06:17 +00:00
6c68062e56 fix: add nil guards to prevent nil-pointer panics
All checks were successful
CI / test (pull_request) Successful in 46s
CI / build (pull_request) Successful in 47s
CI / vet (pull_request) Successful in 59s
- document.go: check if resp is nil before calling resp.Status() in
  Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
  Type() and Click(), preventing panics when DOM elements are missing

Closes #10, #11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:13:43 +00:00
cb2ed10cfd refactor: restructure API, deduplicate code, expand test coverage
Some checks failed
CI / build (push) Failing after 2m4s
CI / test (push) Failing after 2m6s
CI / vet (push) Failing after 2m19s
- Extract shared DeferClose helper, removing 14 duplicate copies
- Rename PlayWright-prefixed types to cleaner names (BrowserOptions,
  BrowserSelection, NewBrowser, etc.)
- Rename fields: ServerAddress, RequireServer (was DontLaunchOnConnectFailure)
- Extract shared initBrowser/mergeOptions into browser_init.go,
  deduplicating ~120 lines between NewBrowser and NewInteractiveBrowser
- Remove unused locator field from document struct
- Add tests for all previously untested packages (archive, aislegopher,
  wegmans, useragents, powerball) and expand existing test suites
- Add MIGRATION.md documenting all breaking API changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 13:59:47 -05:00
e7b7e78796 fix: bug fixes, test coverage, and CI workflow
Some checks failed
CI / vet (push) Failing after 15s
CI / build (push) Failing after 30s
CI / test (push) Failing after 36s
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:14:19 -05:00
567a9f9212 added archive, megamillions, and powerball site logic 2024-12-23 03:18:50 -05:00
5e924eb3f9 changed browser api to return pages that can be acted on, not strictly contents 2024-12-17 23:16:13 -05:00