bug: aislegopher extractor blocked by Cloudflare Turnstile bot protection #55
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The aislegopher site extractor no longer works because aislegopher.com has added Cloudflare Turnstile bot protection. Every page (except
sitemap.xml) returns HTTP 403 with an interactive "Verify you are human" challenge that cannot be solved by automated/headless browsers.How it fails
b.Open()navigates to the product URLopenPage()(playwright.go:224) rejects the 403ErrInvalidStatusCode: 403— the scraper never reaches DOM extractionEven if the 403 check were bypassed, the page content is just the Cloudflare challenge HTML, not the actual product page. The
.h4and.h2selectors would find nothing.Reproduction
What was tested
curlwith real browser User-Agent — 403sitemap.xmlis accessible (likely whitelisted by Cloudflare for SEO)Possible approaches
ShowBrowser: true— Turnstile may auto-resolve for visible browsers, but this only works in desktop environments and requires manual verificationcf_clearancecookie viaCookieJarmight allow subsequent automated requests to passAdditional concern
Because the actual page content is inaccessible, the DOM selectors (
.h4for product name,.h2for price) cannot be verified. Even once Cloudflare access is resolved, the selectors may need updating if the site has been redesigned.