feature: Allrecipes / recipe site extractor #29

Closed
opened 2026-02-14 16:13:01 +00:00 by Claude · 1 comment
Collaborator

Description

Mort has a .recipe command (pkg/logic/recipe/) that extracts recipes from URLs using the generic Readability algorithm. A dedicated recipe site extractor would produce more reliable, structured results for popular recipe sites.

Most recipe sites use JSON-LD structured data (application/ld+json with @type: Recipe) which is far more reliable than parsing the visual HTML.

Proposed API

package recipe

type Recipe struct {
    Name         string
    Description  string
    Author       string
    PrepTime     string
    CookTime     string
    TotalTime    string
    Yield        string  // servings
    Ingredients  []string
    Instructions []string
    ImageURL     string
    Rating       float64
    Calories     string
    SourceURL    string
}

type Config struct{}
var DefaultConfig = Config{}

// ExtractRecipe extracts structured recipe data from any URL.
// Uses JSON-LD structured data when available, falls back to DOM parsing.
func (c Config) ExtractRecipe(ctx context.Context, b extractor.Browser, url string) (*Recipe, error)

Approach

  1. Open the page
  2. Look for <script type="application/ld+json"> containing @type: Recipe
  3. If found, parse the JSON-LD directly (most reliable)
  4. If not, fall back to common recipe page patterns (ingredient lists, instruction steps)

Benefits

  • Works across all major recipe sites (Allrecipes, Food Network, Bon Appetit, etc.)
  • JSON-LD parsing is far more reliable than visual scraping
  • Mort's recipe command gets structured data (ingredients list, cook times) instead of raw article text
  • Handles JavaScript-rendered recipe pages that Readability can't parse
## Description Mort has a `.recipe` command (`pkg/logic/recipe/`) that extracts recipes from URLs using the generic Readability algorithm. A dedicated recipe site extractor would produce more reliable, structured results for popular recipe sites. Most recipe sites use JSON-LD structured data (`application/ld+json` with `@type: Recipe`) which is far more reliable than parsing the visual HTML. ## Proposed API ```go package recipe type Recipe struct { Name string Description string Author string PrepTime string CookTime string TotalTime string Yield string // servings Ingredients []string Instructions []string ImageURL string Rating float64 Calories string SourceURL string } type Config struct{} var DefaultConfig = Config{} // ExtractRecipe extracts structured recipe data from any URL. // Uses JSON-LD structured data when available, falls back to DOM parsing. func (c Config) ExtractRecipe(ctx context.Context, b extractor.Browser, url string) (*Recipe, error) ``` ## Approach 1. Open the page 2. Look for `<script type="application/ld+json">` containing `@type: Recipe` 3. If found, parse the JSON-LD directly (most reliable) 4. If not, fall back to common recipe page patterns (ingredient lists, instruction steps) ## Benefits - Works across all major recipe sites (Allrecipes, Food Network, Bon Appetit, etc.) - JSON-LD parsing is far more reliable than visual scraping - Mort's recipe command gets structured data (ingredients list, cook times) instead of raw article text - Handles JavaScript-rendered recipe pages that Readability can't parse
Claude added the enhancementpriority/medium labels 2026-02-14 21:01:05 +00:00
Author
Collaborator

Implemented in PR #48. Added sites/recipe package with ExtractRecipe() that parses JSON-LD structured data (@type: Recipe) with DOM fallback. Handles @graph containers, HowToStep objects, ISO 8601 durations, and flexible author/yield/image formats. Works across all major recipe sites.

Implemented in PR #48. Added `sites/recipe` package with `ExtractRecipe()` that parses JSON-LD structured data (`@type: Recipe`) with DOM fallback. Handles `@graph` containers, `HowToStep` objects, ISO 8601 durations, and flexible author/yield/image formats. Works across all major recipe sites.
Sign in to join this conversation.