The archive.ph submission flow had several defects that caused Mort's
summary fallback to return placeholder "Working..." pages instead of
real archived content, or hang for the full timeout:
- Context cancellation in the poll loop fell through to a final
WaitForNetworkIdle and returned the doc as success. The function now
returns a typed error (ErrArchiveIncomplete on deadline, wrapped
ctx.Err() on caller cancel).
- The poll only checked doc.URL() — if archive.ph's JS got wedged on
/wip/<id>, the loop spun until timeout. Completion now also requires
a DOM marker (#HEADER, [id^="SHARE"], .TEXT-BLOCK) so URL-only
transitions don't satisfy the check.
- The final URL is now validated against an alphanumeric ID pattern,
rejecting /wip/, /submit, /newest/ and the front page.
- 5-second blind sleep before polling replaced with a bounded
WaitForNetworkIdle that short-circuits when already archived.
- Form selectors now use a cascade (input[name='url'] →
input[type='url'] → input.input-url → input[name='anyway'], and
similar for the submit button) so a single archive.ph markup change
doesn't kill the flow. Errors name which selectors were tried.
- Default timeout lowered from 1 hour to 5 minutes (still overridable
via context deadline). Exposed as DefaultTimeout.
- Poll progress is now logged at slog.Info every 30s so production logs
surface stuck flows.
- Front-page 5xx now retries twice with 1s/4s backoff before failing.
- New exported sentinels: ErrArchiveIncomplete, ErrArchiveSelectorMissing.
- Tests cover URL validator (incl. /wip/, /newest/, short IDs, o-prefix),
selector cascade, DOM completion detector, transient status
classification, and ctx cancellation paths via a thread-safe mutating
mock document. Full integration with a live browser remains hand-tested.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>