Files
go-extractor/go.mod
Steve Dudenhoeffer c1a5814732
All checks were successful
CI / build (pull_request) Successful in 46s
CI / test (pull_request) Successful in 48s
CI / vet (pull_request) Successful in 1m50s
feat: add ReadabilityWithOptions for DOM cleanup before extraction
Sites with infinite scroll (e.g. The Verge) load additional articles
into the DOM, which get included in readability extraction. Add
ReadabilityOptions.RemoveSelectors to strip elements by CSS selector
before parsing, avoiding the need to reimplement the readability
pipeline downstream.

Closes #60

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 01:09:28 +00:00

25 lines
820 B
Modula-2

module gitea.stevedudenhoeffer.com/steve/go-extractor
go 1.24.0
toolchain go1.24.1
require (
github.com/go-shiori/go-readability v0.0.0-20250217085726-9f5bf5ca7612
github.com/playwright-community/playwright-go v0.5200.0
github.com/urfave/cli/v3 v3.0.0-beta1
golang.org/x/text v0.31.0
)
require (
github.com/PuerkitoBio/goquery v1.11.0 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
github.com/deckarep/golang-set/v2 v2.8.0 // indirect
github.com/go-jose/go-jose/v3 v3.0.4 // indirect
github.com/go-shiori/dom v0.0.0-20230515143342-73569d674e1c // indirect
github.com/go-stack/stack v1.8.1 // indirect
github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f // indirect
golang.org/x/net v0.47.0 // indirect
)