feat(archive): keep page open on captcha-status errors so callers can promote
CI / test (push) Successful in 2m6s
CI / vet (push) Successful in 1m21s
CI / build (push) Successful in 2m13s

Adds OpenPageOptions.AllowNonOKStatus. When set, openPage no longer closes
the page on non-2xx (other than 404) and Open returns both a usable Document
and ErrInvalidStatusCode. archive.IsArchived and Archive opt in, so callers
can PromoteToInteractive the captcha page, hand it to a human solver, and
demote back to extract content from the same browser instance — avoiding
the cf_clearance fingerprint-binding issue that re-challenges any fresh
retry browser.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-28 00:29:39 +00:00
parent 841f1ec2bf
commit 3b38637e56
3 changed files with 53 additions and 15 deletions
+8
View File
@@ -7,6 +7,14 @@ import (
type OpenPageOptions struct {
Referer string
// AllowNonOKStatus, when true, keeps the page open and returns a usable
// Document along with ErrInvalidStatusCode on non-2xx responses (other
// than 404, which is treated as ErrPageNotFound and still closes the
// page). This lets callers promote the page to an InteractiveBrowser
// to e.g. let a human solve a Cloudflare captcha that produced a 403,
// then resume extraction from the same browser instance.
AllowNonOKStatus bool
}
type Browser interface {