3b38637e56
Adds OpenPageOptions.AllowNonOKStatus. When set, openPage no longer closes the page on non-2xx (other than 404) and Open returns both a usable Document and ErrInvalidStatusCode. archive.IsArchived and Archive opt in, so callers can PromoteToInteractive the captcha page, hand it to a human solver, and demote back to extract content from the same browser instance — avoiding the cf_clearance fingerprint-binding issue that re-challenges any fresh retry browser. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
25 lines
659 B
Go
25 lines
659 B
Go
package extractor
|
|
|
|
import (
|
|
"context"
|
|
"io"
|
|
)
|
|
|
|
type OpenPageOptions struct {
|
|
Referer string
|
|
|
|
// AllowNonOKStatus, when true, keeps the page open and returns a usable
|
|
// Document along with ErrInvalidStatusCode on non-2xx responses (other
|
|
// than 404, which is treated as ErrPageNotFound and still closes the
|
|
// page). This lets callers promote the page to an InteractiveBrowser
|
|
// to e.g. let a human solve a Cloudflare captcha that produced a 403,
|
|
// then resume extraction from the same browser instance.
|
|
AllowNonOKStatus bool
|
|
}
|
|
|
|
type Browser interface {
|
|
io.Closer
|
|
|
|
Open(ctx context.Context, url string, opts OpenPageOptions) (Document, error)
|
|
}
|