Home > database >  Only get the last loop's data in list
Only get the last loop's data in list

Time:04-04

I am trying to learn python/beautifulsoup and Django by making a small project. for this project I am trying to scrape a website for recipes and then present a page with a random pick. For this I have made a piece of code that works perfect when I just get the first page, 35 recipes. However: I want to grab the recipes from the 2nd and 3rd page as well. I figured I should write a loop for this but I can't seem to get it right. The loop works perfect for scraping the website but only stores the last loop in the lists made for the recipe items. How do I get this code to add info to the list instead of overwriting? The code works perfect for the first 35 items in the list (there are 35 recipes on a page) but not for anything higher.

from django.shortcuts import render
import requests
import re
from bs4 import BeautifulSoup
import random

# Create your views here.
def recipe(request):

#Create soup
    for page in range(0,2):
        webpage_response = requests.get(f"https://www.ah.nl/allerhande/recepten-zoeken?page={page}" )
        webpage = webpage_response.content
        soup = BeautifulSoup(webpage, "html.parser")  
        recipe_links = soup.find_all('a', attrs={'class' : re.compile('^display-card_root__.*')})
        recipe_pictures = soup.find_all('img', attrs={'class' : re.compile('^card-image-set_imageSet__.*')})
        recipe_prep_time = [ul.find('li').text 
                   for ul in soup.find_all('ul',
                        attrs={'class': re.compile('^recipe-card-properties_root')})]


#Set up lists
        links = []
        titles = []
        pictures = []

#create prefix for link
        prefix = "https://ah.nl"

#scrape page for recipe
        for link in recipe_links:
            links.append(prefix   link.get('href'))

        for title in recipe_links:
            titles.append(title.get('aria-label'))

        for img in recipe_pictures:
            pictures.append(img.get('data-srcset'))

        

#create random int to select a recipe
    nummer = random.randint(0,105)

#select correct link for image
    pic_url = pictures[nummer].split(' ')

#create context
    context = {
        "titles" : titles[nummer],
        "pictures" : pic_url[16],
        "preptime" : recipe_prep_time[nummer],
        "link" : links[nummer]
    }

#render page
    return render(request, "randomRecipe/recipe.html", context)

CodePudding user response:

Nice idea - I myself always have the problem of not being able to decide when the offer is so good and overwhelming.

As already mentioned by @Barmar it would be leaner to use a more structured approach storing scraped information - E.g. a list data that holds your dicts with similar structure of context.

You also could select your elements more specific:

    data = []

    for e in soup.select('a[data-testhook="recipe-card"]'):
        data.append({
            'title' : e.span.text,
            'picture' : e.img.get('data-srcset').split()[1],
            'preptime' : e.li.text,
            'link' : prefix e['href']
        })
Example
from django.shortcuts import render
import requests
import re
from bs4 import BeautifulSoup
import random

# Create your views here.
def recipe(request):
    
#create prefix for link
    prefix = "https://ah.nl"
    
#Create soup
    data = []

    for page in range(0,2):
        webpage_response = requests.get(f"https://www.ah.nl/allerhande/recepten-zoeken?page={page}" )
        webpage = webpage_response.content
        soup = BeautifulSoup(webpage, "html.parser")  

        for e in soup.select('a[data-testhook="recipe-card"]'):
            data.append({
                'title' : e.span.text,
                'picture' : e.img.get('data-srcset').split()[1],
                'preptime' : e.li.text,
                'link' : prefix e['href']
            })

#create random int to select a recipe
    nummer = random.randint(0,len(data))

    context = data[nummer]

#render page
    return render(request, "randomRecipe/recipe.html", context)
Context
{'title': 'Noedels met sticky sriracha-aubergine, cashewnoten en garnalen',
 'pictures': 'https://static.ah.nl/static/recepten/img_RAM_PRD159203_220x162_JPG.jpg',
 'preptime': '45 min',
 'link': 'https://ah.nl/allerhande/recept/R-R1196327/noedels-met-sticky-sriracha-aubergine-cashewnoten-en-garnalen'}
  • Related