Epic: Improve headless browser stealth against anti-bot detection #68
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The current stealth system (added in #58 / #59) is insufficient against stricter anti-bot services like archive.ph. The
NewInteractiveBrowser(Chromium) successfully loads archive.ph, butNewBrowser(Firefox, used by mort's summary system) gets HTTP 429 blocked.Investigation reveals several architectural gaps:
NewInteractiveBrowser(WORKS)NewBrowser(FAILS)--disable-blink-featuresappliedChannel: "chromium"appliedThe 12 stealth init scripts were written exclusively for Chromium but are injected unconditionally into all browser engines including Firefox. Firefox receives scripts that reference
window.chrome, Chrome PDF plugins,HeadlessChromeUA stripping, etc. — all are no-ops or actively suspicious on Firefox. Meanwhile, Firefox-specific headless detection vectors are completely unaddressed.Additionally, every browser session has identical static fingerprint values (WebGL renderer, plugin list, connection stats), making fingerprint-based detection trivial across sessions.
Sub-tasks
NewBrowserand align User-Agent with browser enginePrior Art
steve/mortsummary system blocked by archive.ph (HTTP 429)Starting work on #69 (split browser-conditional init scripts and add Firefox-specific stealth). This is the first sub-task of this epic.
Starting work on #70 (default viewport and engine-aligned User-Agent). This is the second sub-task of this epic, building on #69 (completed in #72).