bug: Site extractors silently ignore parsing errors with _ = (20+ locations) #24
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Parent: #1
Description
Across all site extractor modules,
strconv.Atoi(),strconv.ParseFloat(),Text(), and other fallible operations have their errors discarded with_ =. This means malformed or unexpected page content produces zero-value results silently instead of surfacing errors.Locations
sites/aislegopher/aislegopher.go:
res.ID, _ = strconv.Atoi(a[3])— if ID isn't a number, ID=0res.Name, _ = names[0].Text()priceStr, _ := prices[0].Text()thenres.Price, _ = strconv.ParseFloat(priceStr, 64)sites/wegmans/wegmans.go:
id, _ := strconv.Atoi(a[2])_ = doc.WaitForNetworkIdle(&timeout)— network idle failure ignoredres.Name, _ = titles[0].Text()priceStr, _ := prices[0].Text()thenprice, _ := strconv.ParseFloat(priceStr, 64)sites/duckduckgo/duckduckgo.go:
r.Title, _ = titles[0].Text()r.Description, _ = descriptions[0].Text()sites/google/google.go:
title, _ = titles[0].Text()desc, _ = descs[0].Text()sites/duckduckgo/page.go:
_ = Text()patternImpact
When a site changes its HTML structure (which happens frequently), the extractor returns zero-value/empty fields instead of an error. Callers have no way to distinguish "the field genuinely has no value" from "parsing failed".
Fix
At minimum, log warnings for ignored errors. Ideally, return partial results with an error (or collect errors and return them alongside results). For the Text() calls that populate optional fields (title, description), logging is sufficient. For critical fields like price and ID, the error should be returned.
Starting work on this. Plan: return errors for required fields (ID, price), log warnings for optional fields (title, description, unit price). Will fix all 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors.
Work finished — fixed 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors. Required fields (ID, price) now return errors; optional fields (title, description) log warnings. Merged in PR #41.