feat: add ReadabilityWithOptions for DOM cleanup before extraction
Sites with infinite scroll (e.g. The Verge) load additional articles into the DOM, which get included in readability extraction. Add ReadabilityOptions.RemoveSelectors to strip elements by CSS selector before parsing, avoiding the need to reimplement the readability pipeline downstream. Closes #60 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
5
go.mod
5
go.mod
@@ -8,10 +8,11 @@ require (
|
||||
github.com/go-shiori/go-readability v0.0.0-20250217085726-9f5bf5ca7612
|
||||
github.com/playwright-community/playwright-go v0.5200.0
|
||||
github.com/urfave/cli/v3 v3.0.0-beta1
|
||||
golang.org/x/text v0.29.0
|
||||
golang.org/x/text v0.31.0
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/PuerkitoBio/goquery v1.11.0 // indirect
|
||||
github.com/andybalholm/cascadia v1.3.3 // indirect
|
||||
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
|
||||
github.com/deckarep/golang-set/v2 v2.8.0 // indirect
|
||||
@@ -19,5 +20,5 @@ require (
|
||||
github.com/go-shiori/dom v0.0.0-20230515143342-73569d674e1c // indirect
|
||||
github.com/go-stack/stack v1.8.1 // indirect
|
||||
github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f // indirect
|
||||
golang.org/x/net v0.44.0 // indirect
|
||||
golang.org/x/net v0.47.0 // indirect
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user