Weather extractor CSS selectors don't match DuckDuckGo's actual DOM #53

Closed
opened 2026-02-15 22:48:53 +00:00 by Claude · 3 comments
Collaborator

The weather extractor in sites/duckduckgo/weather.go uses CSS selectors that don't match DuckDuckGo's actual weather widget DOM structure, resulting in empty data being returned.

Current selectors (not working):

  • div.module--weather span.module__title__link
  • div.module--weather .module__current-temp
  • div.module--weather .module__weather-summary
  • div.module--weather .module__high-temp / .module__low-temp
  • div.module--weather .module__forecast-day
  • .forecast-day__name, .forecast-day__high, .forecast-day__low
  • div.module--weather .module__hourly-item
  • .hourly-item__time, .hourly-item__temp

DuckDuckGo's actual DOM structure uses React components with class names like:

  • div.react-module article for the weather widget container
  • Various nested div and span elements without the BEM-style class names above

The data structures and conversion logic are correct — only the CSS selectors need to be updated to match the real DOM. The old mort implementation (which worked) used text extraction + regex parsing as a workaround for the unpredictable class names, along with aria-label/title/alt attribute reading for icon hints.

Tested in production: all fields come back empty/zero when using GetWeather() against the live DuckDuckGo site.

The weather extractor in `sites/duckduckgo/weather.go` uses CSS selectors that don't match DuckDuckGo's actual weather widget DOM structure, resulting in empty data being returned. **Current selectors (not working):** - `div.module--weather span.module__title__link` - `div.module--weather .module__current-temp` - `div.module--weather .module__weather-summary` - `div.module--weather .module__high-temp` / `.module__low-temp` - `div.module--weather .module__forecast-day` - `.forecast-day__name`, `.forecast-day__high`, `.forecast-day__low` - `div.module--weather .module__hourly-item` - `.hourly-item__time`, `.hourly-item__temp` **DuckDuckGo's actual DOM structure** uses React components with class names like: - `div.react-module article` for the weather widget container - Various nested `div` and `span` elements without the BEM-style class names above The data structures and conversion logic are correct — only the CSS selectors need to be updated to match the real DOM. The old mort implementation (which worked) used text extraction + regex parsing as a workaround for the unpredictable class names, along with aria-label/title/alt attribute reading for icon hints. Tested in production: all fields come back empty/zero when using `GetWeather()` against the live DuckDuckGo site.
Author
Collaborator

Starting work on this. Plan of approach:

  1. Use Playwright to inspect the live DuckDuckGo weather widget DOM and identify the actual CSS selectors/structure
  2. Update extractWeather() in sites/duckduckgo/weather.go with correct selectors
  3. Submit a PR with the fix

Will use the old mort approach (text extraction + regex, aria-label/title/alt for icons) as a fallback strategy if class names are unpredictable.

Starting work on this. Plan of approach: 1. Use Playwright to inspect the live DuckDuckGo weather widget DOM and identify the actual CSS selectors/structure 2. Update `extractWeather()` in `sites/duckduckgo/weather.go` with correct selectors 3. Submit a PR with the fix Will use the old mort approach (text extraction + regex, aria-label/title/alt for icons) as a fallback strategy if class names are unpredictable.
Author
Collaborator

Work finished. PR #54 replaces all class-based CSS selectors with structural/attribute-based selectors that are resilient to DDG's randomized CSS module class names.

Key changes:

  • Widget found via article:has(img[src*='weatherkit']) instead of div.module--weather
  • Positional selectors (div:first-child, p:first-of-type, etc.) instead of BEM class names
  • Icon hints from img[alt] attributes (e.g., "Cloudy", "Snow", "PartlyCloudy")
  • Precipitation from span > span structure
  • CurrentTemp derived from first hourly entry (DDG no longer shows standalone current temp)
  • HighTemp/LowTemp derived from first daily forecast entry
  • Humidity/Wind identified by text content matching ("Humidity:", "Wind:")

All unit tests updated and passing.

Work finished. PR #54 replaces all class-based CSS selectors with structural/attribute-based selectors that are resilient to DDG's randomized CSS module class names. Key changes: - Widget found via `article:has(img[src*='weatherkit'])` instead of `div.module--weather` - Positional selectors (`div:first-child`, `p:first-of-type`, etc.) instead of BEM class names - Icon hints from `img[alt]` attributes (e.g., "Cloudy", "Snow", "PartlyCloudy") - Precipitation from `span > span` structure - `CurrentTemp` derived from first hourly entry (DDG no longer shows standalone current temp) - `HighTemp`/`LowTemp` derived from first daily forecast entry - Humidity/Wind identified by text content matching ("Humidity:", "Wind:") All unit tests updated and passing.
Author
Collaborator

PR #54 looks good. The structural/attribute-based selectors should be resilient to DDG's CSS module class name randomization.

One thing to note for downstream consumers: Condition on DayForecast and HourlyForecast is now set to the raw IconHint value (e.g. "PartlyCloudy") rather than human-readable text. Mort will need to normalize these (split camelCase → title case) when converting to its local types.

PR #54 looks good. The structural/attribute-based selectors should be resilient to DDG's CSS module class name randomization. One thing to note for downstream consumers: `Condition` on `DayForecast` and `HourlyForecast` is now set to the raw `IconHint` value (e.g. `"PartlyCloudy"`) rather than human-readable text. Mort will need to normalize these (split camelCase → title case) when converting to its local types.
steve closed this issue 2026-02-15 23:07:10 +00:00
Sign in to join this conversation.