The archive.ph submission flow had several defects that caused Mort's
summary fallback to return placeholder "Working..." pages instead of
real archived content, or hang for the full timeout:
- Context cancellation in the poll loop fell through to a final
WaitForNetworkIdle and returned the doc as success. The function now
returns a typed error (ErrArchiveIncomplete on deadline, wrapped
ctx.Err() on caller cancel).
- The poll only checked doc.URL() — if archive.ph's JS got wedged on
/wip/<id>, the loop spun until timeout. Completion now also requires
a DOM marker (#HEADER, [id^="SHARE"], .TEXT-BLOCK) so URL-only
transitions don't satisfy the check.
- The final URL is now validated against an alphanumeric ID pattern,
rejecting /wip/, /submit, /newest/ and the front page.
- 5-second blind sleep before polling replaced with a bounded
WaitForNetworkIdle that short-circuits when already archived.
- Form selectors now use a cascade (input[name='url'] →
input[type='url'] → input.input-url → input[name='anyway'], and
similar for the submit button) so a single archive.ph markup change
doesn't kill the flow. Errors name which selectors were tried.
- Default timeout lowered from 1 hour to 5 minutes (still overridable
via context deadline). Exposed as DefaultTimeout.
- Poll progress is now logged at slog.Info every 30s so production logs
surface stuck flows.
- Front-page 5xx now retries twice with 1s/4s backoff before failing.
- New exported sentinels: ErrArchiveIncomplete, ErrArchiveSelectorMissing.
- Tests cover URL validator (incl. /wip/, /newest/, short IDs, o-prefix),
selector cascade, DOM completion detector, transient status
classification, and ctx cancellation paths via a thread-safe mutating
mock document. Full integration with a live browser remains hand-tested.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds OpenPageOptions.AllowNonOKStatus. When set, openPage no longer closes
the page on non-2xx (other than 404) and Open returns both a usable Document
and ErrInvalidStatusCode. archive.IsArchived and Archive opt in, so callers
can PromoteToInteractive the captcha page, hand it to a human solver, and
demote back to extract content from the same browser instance — avoiding
the cf_clearance fingerprint-binding issue that re-challenges any fresh
retry browser.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- playwright.go: check error from page.Context().Cookies() before
iterating over results, preventing silent failures
- archive.go: replace time.Sleep(5s) with context-aware select using
time.After, allowing the operation to be cancelled promptly
Closes#7, #18
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- document.go: check if resp is nil before calling resp.Status() in
Refresh(), since Playwright's Reload() can return a nil response
- archive.go: check SelectFirst() results for nil before calling
Type() and Click(), preventing panics when DOM elements are missing
Closes#10, #11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Nodes.First() panic on empty slice (return nil)
- Fix ticker leak in archive.go (create once, defer Stop)
- Fix cookie path matching for empty and root paths
- Fix lost query params in google.go (u.Query().Set was discarded)
- Fix type assertion panic in useragents.go
- Fix dropped date parse error in powerball.go
- Remove unreachable dead code in megamillions.go and powerball.go
- Simplify document.go WaitForNetworkIdle, remove unused root field
- Remove debug fmt.Println calls across codebase
- Replace panic(err) with stderr+exit in all cmd/ programs
- Fix duckduckgo cmd: remove useless defer, return error on bad safesearch
- Fix archive cmd: ToConfig returns error instead of panicking
- Add 39+ unit tests across 6 new test files
- Add Gitea Actions CI workflow (build, test, vet in parallel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>