3 Commits

Author SHA1 Message Date
65cf6b027f feat: add RemoveHidden option to strip display:none elements before extraction
All checks were successful
CI / vet (pull_request) Successful in 34s
CI / test (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 1m5s
When RemoveHidden is true, JavaScript is evaluated on the live page to
remove all elements with computed display:none before readability
extraction. This defends against anti-scraping honeypots that embed
prompt injections in hidden DOM elements.

The implementation uses an optional pageEvaluator interface so that the
concrete document (backed by Playwright) supports it while the Document
interface remains unchanged.

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 14:06:17 +00:00
c1a5814732 feat: add ReadabilityWithOptions for DOM cleanup before extraction
All checks were successful
CI / build (pull_request) Successful in 46s
CI / test (pull_request) Successful in 48s
CI / vet (pull_request) Successful in 1m50s
Sites with infinite scroll (e.g. The Verge) load additional articles
into the DOM, which get included in readability extraction. Add
ReadabilityOptions.RemoveSelectors to strip elements by CSS selector
before parsing, avoiding the need to reimplement the readability
pipeline downstream.

Closes #60

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:09:28 +00:00
cb2ed10cfd refactor: restructure API, deduplicate code, expand test coverage
Some checks failed
CI / build (push) Failing after 2m4s
CI / test (push) Failing after 2m6s
CI / vet (push) Failing after 2m19s
- Extract shared DeferClose helper, removing 14 duplicate copies
- Rename PlayWright-prefixed types to cleaner names (BrowserOptions,
  BrowserSelection, NewBrowser, etc.)
- Rename fields: ServerAddress, RequireServer (was DontLaunchOnConnectFailure)
- Extract shared initBrowser/mergeOptions into browser_init.go,
  deduplicating ~120 lines between NewBrowser and NewInteractiveBrowser
- Remove unused locator field from document struct
- Add tests for all previously untested packages (archive, aislegopher,
  wegmans, useragents, powerball) and expand existing test suites
- Add MIGRATION.md documenting all breaking API changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 13:59:47 -05:00