bug: Site extractors silently ignore parsing errors with _ = (20+ locations) #24

Closed
opened 2026-02-14 16:09:37 +00:00 by Claude · 2 comments
Collaborator

Parent: #1

Description

Across all site extractor modules, strconv.Atoi(), strconv.ParseFloat(), Text(), and other fallible operations have their errors discarded with _ =. This means malformed or unexpected page content produces zero-value results silently instead of surfacing errors.

Locations

sites/aislegopher/aislegopher.go:

  • Line 51: res.ID, _ = strconv.Atoi(a[3]) — if ID isn't a number, ID=0
  • Line 62: res.Name, _ = names[0].Text()
  • Line 68-71: priceStr, _ := prices[0].Text() then res.Price, _ = strconv.ParseFloat(priceStr, 64)

sites/wegmans/wegmans.go:

  • Line 57: id, _ := strconv.Atoi(a[2])
  • Line 71: _ = doc.WaitForNetworkIdle(&timeout) — network idle failure ignored
  • Line 80: res.Name, _ = titles[0].Text()
  • Lines 87-93: priceStr, _ := prices[0].Text() then price, _ := strconv.ParseFloat(priceStr, 64)
  • Lines 101-112: Same pattern for unit prices

sites/duckduckgo/duckduckgo.go:

  • Lines 119-120: r.Title, _ = titles[0].Text()
  • Lines 125-126: r.Description, _ = descriptions[0].Text()

sites/google/google.go:

  • Line 120: title, _ = titles[0].Text()
  • Line 126: desc, _ = descs[0].Text()

sites/duckduckgo/page.go:

  • Lines 42, 48: Same _ = Text() pattern

Impact

When a site changes its HTML structure (which happens frequently), the extractor returns zero-value/empty fields instead of an error. Callers have no way to distinguish "the field genuinely has no value" from "parsing failed".

Fix

At minimum, log warnings for ignored errors. Ideally, return partial results with an error (or collect errors and return them alongside results). For the Text() calls that populate optional fields (title, description), logging is sufficient. For critical fields like price and ID, the error should be returned.

**Parent:** #1 ## Description Across all site extractor modules, `strconv.Atoi()`, `strconv.ParseFloat()`, `Text()`, and other fallible operations have their errors discarded with `_ =`. This means malformed or unexpected page content produces zero-value results silently instead of surfacing errors. ### Locations **sites/aislegopher/aislegopher.go:** - Line 51: `res.ID, _ = strconv.Atoi(a[3])` — if ID isn't a number, ID=0 - Line 62: `res.Name, _ = names[0].Text()` - Line 68-71: `priceStr, _ := prices[0].Text()` then `res.Price, _ = strconv.ParseFloat(priceStr, 64)` **sites/wegmans/wegmans.go:** - Line 57: `id, _ := strconv.Atoi(a[2])` - Line 71: `_ = doc.WaitForNetworkIdle(&timeout)` — network idle failure ignored - Line 80: `res.Name, _ = titles[0].Text()` - Lines 87-93: `priceStr, _ := prices[0].Text()` then `price, _ := strconv.ParseFloat(priceStr, 64)` - Lines 101-112: Same pattern for unit prices **sites/duckduckgo/duckduckgo.go:** - Lines 119-120: `r.Title, _ = titles[0].Text()` - Lines 125-126: `r.Description, _ = descriptions[0].Text()` **sites/google/google.go:** - Line 120: `title, _ = titles[0].Text()` - Line 126: `desc, _ = descs[0].Text()` **sites/duckduckgo/page.go:** - Lines 42, 48: Same `_ = Text()` pattern ## Impact When a site changes its HTML structure (which happens frequently), the extractor returns zero-value/empty fields instead of an error. Callers have no way to distinguish "the field genuinely has no value" from "parsing failed". ## Fix At minimum, log warnings for ignored errors. Ideally, return partial results with an error (or collect errors and return them alongside results). For the Text() calls that populate optional fields (title, description), logging is sufficient. For critical fields like price and ID, the error should be returned.
Claude added the bugpriority/hightype/task labels 2026-02-14 16:09:42 +00:00
Author
Collaborator

Starting work on this. Plan: return errors for required fields (ID, price), log warnings for optional fields (title, description, unit price). Will fix all 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors.

Starting work on this. Plan: return errors for required fields (ID, price), log warnings for optional fields (title, description, unit price). Will fix all 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors.
Author
Collaborator

Work finished — fixed 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors. Required fields (ID, price) now return errors; optional fields (title, description) log warnings. Merged in PR #41.

Work finished — fixed 20+ locations across aislegopher, wegmans, duckduckgo, and google extractors. Required fields (ID, price) now return errors; optional fields (title, description) log warnings. Merged in PR #41.
Sign in to join this conversation.