feat: add ReadabilityWithOptions for DOM cleanup before extraction
All checks were successful
CI / build (pull_request) Successful in 46s
CI / test (pull_request) Successful in 48s
CI / vet (pull_request) Successful in 1m50s

Sites with infinite scroll (e.g. The Verge) load additional articles
into the DOM, which get included in readability extraction. Add
ReadabilityOptions.RemoveSelectors to strip elements by CSS selector
before parsing, avoiding the need to reimplement the readability
pipeline downstream.

Closes #60

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-19 01:09:28 +00:00
parent 3357972246
commit c1a5814732
4 changed files with 188 additions and 5 deletions

5
go.mod
View File

@@ -8,10 +8,11 @@ require (
github.com/go-shiori/go-readability v0.0.0-20250217085726-9f5bf5ca7612
github.com/playwright-community/playwright-go v0.5200.0
github.com/urfave/cli/v3 v3.0.0-beta1
golang.org/x/text v0.29.0
golang.org/x/text v0.31.0
)
require (
github.com/PuerkitoBio/goquery v1.11.0 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
github.com/deckarep/golang-set/v2 v2.8.0 // indirect
@@ -19,5 +20,5 @@ require (
github.com/go-shiori/dom v0.0.0-20230515143342-73569d674e1c // indirect
github.com/go-stack/stack v1.8.1 // indirect
github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f // indirect
golang.org/x/net v0.44.0 // indirect
golang.org/x/net v0.47.0 // indirect
)