Sites with infinite scroll (e.g. The Verge) load additional articles into the DOM, which get included in readability extraction. Add ReadabilityOptions.RemoveSelectors to strip elements by CSS selector before parsing, avoiding the need to reimplement the readability pipeline downstream. Closes #60 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 lines
820 B
Modula-2
25 lines
820 B
Modula-2
module gitea.stevedudenhoeffer.com/steve/go-extractor
|
|
|
|
go 1.24.0
|
|
|
|
toolchain go1.24.1
|
|
|
|
require (
|
|
github.com/go-shiori/go-readability v0.0.0-20250217085726-9f5bf5ca7612
|
|
github.com/playwright-community/playwright-go v0.5200.0
|
|
github.com/urfave/cli/v3 v3.0.0-beta1
|
|
golang.org/x/text v0.31.0
|
|
)
|
|
|
|
require (
|
|
github.com/PuerkitoBio/goquery v1.11.0 // indirect
|
|
github.com/andybalholm/cascadia v1.3.3 // indirect
|
|
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
|
|
github.com/deckarep/golang-set/v2 v2.8.0 // indirect
|
|
github.com/go-jose/go-jose/v3 v3.0.4 // indirect
|
|
github.com/go-shiori/dom v0.0.0-20230515143342-73569d674e1c // indirect
|
|
github.com/go-stack/stack v1.8.1 // indirect
|
|
github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f // indirect
|
|
golang.org/x/net v0.47.0 // indirect
|
|
)
|