fix(archive): harden archive.ph submit/poll flow #87

Merged
steve merged 1 commits from fix/archive-ph-poll-hardening into main 2026-05-15 22:39:40 +00:00
Owner

Summary

The archive.Archive flow used by Mort's summary system was returning placeholder "Working…" pages instead of actual archived content, and hanging up to 1 hour on stuck submissions. This PR fixes 8 defects.

Defects fixed

  1. Silent ctx.Done() exit returning placeholder doc — poll loop now classifies the ctx error (deadline → ErrArchiveIncomplete, caller-cancel → wrapped context.Canceled), closes the doc, and returns the error.
  2. URL-only finish detectionisArchiveComplete now requires both a snapshot-shaped URL AND a DOM completion marker (#HEADER, [id^='SHARE'], .TEXT-BLOCK).
  3. No final-URL validationisFinalSnapshotURL rejects /wip/, /submit, /newest/, and the front page; path must match ^/(o/)?[A-Za-z0-9]{5,}(/|$).
  4. 5-second blind sleep — replaced with WaitForNetworkIdle(8s) then a 1s poll interval (was 5s blind + 5s poll).
  5. Brittle form selectors — cascade of 4 URL-input selectors and 3 submit selectors; error names which selectors were tried.
  6. 1-hour default timeoutDefaultTimeout = 5 * time.Minute (still ctx-overridable).
  7. Debug-only progress logsslog.Info "still waiting for archive.ph" fires every 30s with current URL.
  8. No 5xx retry on archive.ph itself — front page Open retries up to 2× with 1s/4s backoff.

New exports (additive, no breaking changes)

  • ErrArchiveIncomplete — wraps context.DeadlineExceeded so errors.Is works for either
  • ErrArchiveSelectorMissing — naming which selectors were tried
  • DefaultTimeout — exposed constant (5 min)

archive.Archive, archive.IsArchived, and their Config methods keep their existing signatures.

Test coverage

  • TestIsFinalSnapshotURL (12 subcases) — front page, /wip/, /submit, /newest/, short IDs, o/ prefix
  • TestHasCompletionMarker — empty doc + each completionSelectors entry
  • TestFindURLInput_Cascade + TestFindSubmitButton_Cascade — first-wins, last-fallback-works, all-miss-returns-nil
  • TestIsTransientStatus (7 subcases) — 5xx → retry, 4xx → no retry
  • TestPollUntilArchived_ContextCancelled_NeverCompletes — deadline → ErrArchiveIncomplete
  • TestPollUntilArchived_CallerCancelled — cancel → wrapped context.Canceled, NOT ErrArchiveIncomplete
  • TestPollUntilArchived_SuccessRequiresBothURLAndMarker — regression contract for defect #2
  • TestPollUntilArchived_URLOnly_NotEnough — final-looking URL with no marker hits the deadline
  • TestArchive_SelectorMissing — full Archive call returns ErrArchiveSelectorMissing

go build ./..., go vet ./..., go test -race -count=1 ./sites/archive/... all clean.

LOC delta

+832 / -70 (most addition is the new test file + docstrings).

## Summary The `archive.Archive` flow used by Mort's summary system was returning placeholder "Working…" pages instead of actual archived content, and hanging up to 1 hour on stuck submissions. This PR fixes 8 defects. ## Defects fixed 1. **Silent `ctx.Done()` exit returning placeholder doc** — poll loop now classifies the ctx error (deadline → `ErrArchiveIncomplete`, caller-cancel → wrapped `context.Canceled`), closes the doc, and returns the error. 2. **URL-only finish detection** — `isArchiveComplete` now requires **both** a snapshot-shaped URL AND a DOM completion marker (`#HEADER`, `[id^='SHARE']`, `.TEXT-BLOCK`). 3. **No final-URL validation** — `isFinalSnapshotURL` rejects `/wip/`, `/submit`, `/newest/`, and the front page; path must match `^/(o/)?[A-Za-z0-9]{5,}(/|$)`. 4. **5-second blind sleep** — replaced with `WaitForNetworkIdle(8s)` then a 1s poll interval (was 5s blind + 5s poll). 5. **Brittle form selectors** — cascade of 4 URL-input selectors and 3 submit selectors; error names which selectors were tried. 6. **1-hour default timeout** — `DefaultTimeout = 5 * time.Minute` (still ctx-overridable). 7. **Debug-only progress logs** — `slog.Info` "still waiting for archive.ph" fires every 30s with current URL. 8. **No 5xx retry on archive.ph itself** — front page Open retries up to 2× with 1s/4s backoff. ## New exports (additive, no breaking changes) - `ErrArchiveIncomplete` — wraps `context.DeadlineExceeded` so `errors.Is` works for either - `ErrArchiveSelectorMissing` — naming which selectors were tried - `DefaultTimeout` — exposed constant (5 min) `archive.Archive`, `archive.IsArchived`, and their `Config` methods keep their existing signatures. ## Test coverage - `TestIsFinalSnapshotURL` (12 subcases) — front page, `/wip/`, `/submit`, `/newest/`, short IDs, `o/` prefix - `TestHasCompletionMarker` — empty doc + each `completionSelectors` entry - `TestFindURLInput_Cascade` + `TestFindSubmitButton_Cascade` — first-wins, last-fallback-works, all-miss-returns-nil - `TestIsTransientStatus` (7 subcases) — 5xx → retry, 4xx → no retry - `TestPollUntilArchived_ContextCancelled_NeverCompletes` — deadline → `ErrArchiveIncomplete` - `TestPollUntilArchived_CallerCancelled` — cancel → wrapped `context.Canceled`, NOT `ErrArchiveIncomplete` - `TestPollUntilArchived_SuccessRequiresBothURLAndMarker` — regression contract for defect #2 - `TestPollUntilArchived_URLOnly_NotEnough` — final-looking URL with no marker hits the deadline - `TestArchive_SelectorMissing` — full Archive call returns `ErrArchiveSelectorMissing` `go build ./...`, `go vet ./...`, `go test -race -count=1 ./sites/archive/...` all clean. ## LOC delta +832 / -70 (most addition is the new test file + docstrings).
steve added 1 commit 2026-05-15 22:39:16 +00:00
fix(archive): harden archive.ph submit/poll flow
CI / build (pull_request) Successful in 1m5s
CI / vet (pull_request) Successful in 1m26s
CI / test (pull_request) Successful in 1m27s
45fa7c4e8f
The archive.ph submission flow had several defects that caused Mort's
summary fallback to return placeholder "Working..." pages instead of
real archived content, or hang for the full timeout:

- Context cancellation in the poll loop fell through to a final
  WaitForNetworkIdle and returned the doc as success. The function now
  returns a typed error (ErrArchiveIncomplete on deadline, wrapped
  ctx.Err() on caller cancel).
- The poll only checked doc.URL() — if archive.ph's JS got wedged on
  /wip/<id>, the loop spun until timeout. Completion now also requires
  a DOM marker (#HEADER, [id^="SHARE"], .TEXT-BLOCK) so URL-only
  transitions don't satisfy the check.
- The final URL is now validated against an alphanumeric ID pattern,
  rejecting /wip/, /submit, /newest/ and the front page.
- 5-second blind sleep before polling replaced with a bounded
  WaitForNetworkIdle that short-circuits when already archived.
- Form selectors now use a cascade (input[name='url'] →
  input[type='url'] → input.input-url → input[name='anyway'], and
  similar for the submit button) so a single archive.ph markup change
  doesn't kill the flow. Errors name which selectors were tried.
- Default timeout lowered from 1 hour to 5 minutes (still overridable
  via context deadline). Exposed as DefaultTimeout.
- Poll progress is now logged at slog.Info every 30s so production logs
  surface stuck flows.
- Front-page 5xx now retries twice with 1s/4s backoff before failing.
- New exported sentinels: ErrArchiveIncomplete, ErrArchiveSelectorMissing.
- Tests cover URL validator (incl. /wip/, /newest/, short IDs, o-prefix),
  selector cascade, DOM completion detector, transient status
  classification, and ctx cancellation paths via a thread-safe mutating
  mock document. Full integration with a live browser remains hand-tested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
steve merged commit cccf3c4f83 into main 2026-05-15 22:39:40 +00:00
steve deleted branch fix/archive-ph-poll-hardening 2026-05-15 22:39:40 +00:00
Sign in to join this conversation.