Files
go-extractor/browser.go
T
steve 3b38637e56
CI / test (push) Successful in 2m6s
CI / vet (push) Successful in 1m21s
CI / build (push) Successful in 2m13s
feat(archive): keep page open on captcha-status errors so callers can promote
Adds OpenPageOptions.AllowNonOKStatus. When set, openPage no longer closes
the page on non-2xx (other than 404) and Open returns both a usable Document
and ErrInvalidStatusCode. archive.IsArchived and Archive opt in, so callers
can PromoteToInteractive the captcha page, hand it to a human solver, and
demote back to extract content from the same browser instance — avoiding
the cf_clearance fingerprint-binding issue that re-challenges any fresh
retry browser.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 00:29:39 +00:00

25 lines
659 B
Go

package extractor
import (
"context"
"io"
)
type OpenPageOptions struct {
Referer string
// AllowNonOKStatus, when true, keeps the page open and returns a usable
// Document along with ErrInvalidStatusCode on non-2xx responses (other
// than 404, which is treated as ErrPageNotFound and still closes the
// page). This lets callers promote the page to an InteractiveBrowser
// to e.g. let a human solve a Cloudflare captcha that produced a 403,
// then resume extraction from the same browser instance.
AllowNonOKStatus bool
}
type Browser interface {
io.Closer
Open(ctx context.Context, url string, opts OpenPageOptions) (Document, error)
}