Support anti-bot detection evasion (navigator.webdriver, launch args) #56

Closed
opened 2026-02-17 20:06:36 +00:00 by Claude · 2 comments
Collaborator

Problem

Sites like archive.ph detect Playwright-controlled browsers and return HTTP 429 or serve captcha pages instead of content. The primary detection vector is navigator.webdriver === true, which Playwright sets on all controlled browsers by default.

Currently BrowserOptions doesn't expose any way to mitigate this, so consumers have no recourse when a site blocks automated access.

Proposed API Changes

1. Expose browser launch arguments

Add a LaunchArgs []string field to BrowserOptions so consumers can pass flags like:

extractor.BrowserOptions{
    Browser:    extractor.BrowserChromium,
    LaunchArgs: []string{"--disable-blink-features=AutomationControlled"},
}

This would be passed through to playwright.BrowserTypeLaunchOptions.Args.

2. Expose context init scripts

Add an InitScripts []string field to BrowserOptions to allow injecting JavaScript before any page loads. This enables overriding navigator.webdriver and other detectable properties:

extractor.BrowserOptions{
    InitScripts: []string{
        `Object.defineProperty(navigator, 'webdriver', {get: () => undefined})`,
    },
}

This would be passed through to browserContext.AddInitScript().

3. Optional: Stealth mode preset

A convenience Stealth bool field that applies common anti-detection measures automatically:

  • For Chromium: --disable-blink-features=AutomationControlled
  • Init script to override navigator.webdriver
  • Init script to populate navigator.plugins and navigator.mimeTypes
  • Override window.chrome runtime properties
  • Fix window.outerWidth/outerHeight to match viewport

Context

This came up in steve/mort#687 — the summary system's headless browser gets 429'd by archive.ph. The interactive browser (same Playwright, same user agent) works fine when a human is controlling it, but the webdriver property gives away the automation.

Detection Vectors (for reference)

Signal Headless Headed+Playwright Real Browser
navigator.webdriver true true undefined
window.outerWidth 0 normal normal
navigator.plugins.length 0 0 3+
WebGL renderer SwiftShader/Mesa SwiftShader/Mesa Real GPU
window.chrome missing (Firefox N/A) missing present (Chromium)

The most impactful single change would be #1 (launch args) since --disable-blink-features=AutomationControlled alone fixes the webdriver property for Chromium and is the most commonly checked signal.

## Problem Sites like archive.ph detect Playwright-controlled browsers and return HTTP 429 or serve captcha pages instead of content. The primary detection vector is `navigator.webdriver === true`, which Playwright sets on all controlled browsers by default. Currently `BrowserOptions` doesn't expose any way to mitigate this, so consumers have no recourse when a site blocks automated access. ## Proposed API Changes ### 1. Expose browser launch arguments Add a `LaunchArgs []string` field to `BrowserOptions` so consumers can pass flags like: ```go extractor.BrowserOptions{ Browser: extractor.BrowserChromium, LaunchArgs: []string{"--disable-blink-features=AutomationControlled"}, } ``` This would be passed through to `playwright.BrowserTypeLaunchOptions.Args`. ### 2. Expose context init scripts Add an `InitScripts []string` field to `BrowserOptions` to allow injecting JavaScript before any page loads. This enables overriding `navigator.webdriver` and other detectable properties: ```go extractor.BrowserOptions{ InitScripts: []string{ `Object.defineProperty(navigator, 'webdriver', {get: () => undefined})`, }, } ``` This would be passed through to `browserContext.AddInitScript()`. ### 3. Optional: Stealth mode preset A convenience `Stealth bool` field that applies common anti-detection measures automatically: - For Chromium: `--disable-blink-features=AutomationControlled` - Init script to override `navigator.webdriver` - Init script to populate `navigator.plugins` and `navigator.mimeTypes` - Override `window.chrome` runtime properties - Fix `window.outerWidth`/`outerHeight` to match viewport ## Context This came up in [steve/mort#687](https://gitea.stevedudenhoeffer.com/steve/mort/issues/687) — the summary system's headless browser gets 429'd by archive.ph. The interactive browser (same Playwright, same user agent) works fine when a human is controlling it, but the `webdriver` property gives away the automation. ## Detection Vectors (for reference) | Signal | Headless | Headed+Playwright | Real Browser | |--------|----------|-------------------|--------------| | `navigator.webdriver` | `true` | `true` | `undefined` | | `window.outerWidth` | `0` | normal | normal | | `navigator.plugins.length` | `0` | `0` | `3+` | | WebGL renderer | SwiftShader/Mesa | SwiftShader/Mesa | Real GPU | | `window.chrome` | missing (Firefox N/A) | missing | present (Chromium) | The most impactful single change would be #1 (launch args) since `--disable-blink-features=AutomationControlled` alone fixes the `webdriver` property for Chromium and is the most commonly checked signal.
Author
Collaborator

Starting work on this issue. My plan:

  1. Add LaunchArgs []string, InitScripts []string, and Stealth *bool fields to BrowserOptions
  2. Update mergeOptions to handle the new fields
  3. Update initBrowser to pass launch args, apply stealth presets, and inject init scripts into the browser context
  4. Default Stealth to true in both NewBrowser and NewInteractiveBrowser
  5. Add a --no-stealth CLI flag to the browser flags package
  6. Ensure tests pass
Starting work on this issue. My plan: 1. Add `LaunchArgs []string`, `InitScripts []string`, and `Stealth *bool` fields to `BrowserOptions` 2. Update `mergeOptions` to handle the new fields 3. Update `initBrowser` to pass launch args, apply stealth presets, and inject init scripts into the browser context 4. Default `Stealth` to `true` in both `NewBrowser` and `NewInteractiveBrowser` 5. Add a `--no-stealth` CLI flag to the browser flags package 6. Ensure tests pass
Author
Collaborator

Work complete. Merged via PR #57.

What was implemented:

  • Stealth *bool field on BrowserOptions — defaults to true in NewBrowser and NewInteractiveBrowser
  • LaunchArgs []string — custom browser launch arguments passed to Playwright
  • InitScripts []string — JavaScript injected into every browser context before page scripts
  • Stealth presets: navigator.webdriver override, plugin/mimeType spoofing, window.chrome stub, outerWidth/outerHeight fix, Chromium --disable-blink-features=AutomationControlled
  • CLI --no-stealth flag for command-line tools
  • Unit tests for merge behavior of new fields
Work complete. Merged via PR #57. **What was implemented:** - `Stealth *bool` field on `BrowserOptions` — defaults to `true` in `NewBrowser` and `NewInteractiveBrowser` - `LaunchArgs []string` — custom browser launch arguments passed to Playwright - `InitScripts []string` — JavaScript injected into every browser context before page scripts - Stealth presets: `navigator.webdriver` override, plugin/mimeType spoofing, `window.chrome` stub, `outerWidth`/`outerHeight` fix, Chromium `--disable-blink-features=AutomationControlled` - CLI `--no-stealth` flag for command-line tools - Unit tests for merge behavior of new fields
Sign in to join this conversation.