feat: add RemoveHidden option to strip display:none elements before extraction
All checks were successful
CI / vet (pull_request) Successful in 34s
CI / test (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 1m5s

When RemoveHidden is true, JavaScript is evaluated on the live page to
remove all elements with computed display:none before readability
extraction. This defends against anti-scraping honeypots that embed
prompt injections in hidden DOM elements.

The implementation uses an optional pageEvaluator interface so that the
concrete document (backed by Playwright) supports it while the Document
interface remains unchanged.

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-20 14:06:17 +00:00
parent c1a5814732
commit 65cf6b027f
3 changed files with 164 additions and 0 deletions

View File

@@ -68,6 +68,10 @@ func (d *document) Refresh() error {
return nil
}
func (d *document) PageEvaluate(expression string) (interface{}, error) {
return d.page.Evaluate(expression)
}
func (d *document) WaitForNetworkIdle(timeout *time.Duration) error {
if timeout == nil {
t := 30 * time.Second