I'm trying to figure out how i can append several values to a list correctly. The webpage I'm scraping is a food blog. I want to retrieve the title for a recipe and all the recipe keys(gluten free, vegan, dairy free, vegetarian etc) associated to that specific recipe. I'm able to retrieve the information from the page but the problem I'm having is appending several recipe keys to a single row on a list, so if the first recipe on the page is both dairy free and gluten free I'm not able to append them so that they match the row of corresponding recipe. I'm sharing a piece of my code so you can see what I'm working with. Appreciate the help thanks in advance.
recipe = []
key = []
for page in pages:
page = requests.get('https://www.skinnytaste.com/page/' str(page) '/')
soup = BeautifulSoup(page.text, 'html.parser')
recipes = soup.find_all('article', class_='post teaser-post odd')
recipes.extend(soup.find_all('article', class_='post teaser-post even'))
sleep(randint(2, 8))
for r in recipes:
titles = r.h2.text
recipe.append(titles)
print(titles)
post_meta = r.find('div', class_='post-meta')
icons = post_meta.find('div', class_='icons')
if not (post_meta.find('div', class_='icons') is None):
keys = icons.find_all('span')
for k in keys:
recipe_key = k.find('a').find('img').get('alt')
key.append(recipe_key)
print(recipe_key)
CodePudding user response:
Initialize an empty list called rows
. Then create a dictionary of each row
, update the dictionary dynamically, as some recipes will have more "keys" than others. Then append that dictionary row
into your list of rows
. Then pandas can use that to construct the table.
import requests
import pandas as pd
from bs4 import BeautifulSoup
from time import sleep
from random import randint
headers = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36'}
rows = []
pages = range(1,5)
for page in pages:
response = requests.get('https://www.skinnytaste.com/page/' str(page) '/', headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
recipes = soup.find_all('article', class_='post teaser-post odd')
recipes.extend(soup.find_all('article', class_='post teaser-post even'))
sleep(randint(2, 8))
for r in recipes:
titles = r.h2.text
print(titles)
row = {'Title':titles}
post_meta = r.find('div', class_='post-meta')
icons = post_meta.find('div', class_='icons')
if not (post_meta.find('div', class_='icons') is None):
keys = icons.find_all('span')
for count, k in enumerate(keys, start=1):
recipe_key = k.find('a').find('img').get('alt')
row.update({'key_%.2d' %count: recipe_key})
print(recipe_key)
rows.append(row)
results = pd.DataFrame(rows)
Output:
print(results.to_string())
Title key_01 key_02 key_03 key_04 key_05 key_06 key_07
0 Baked Pumpkin Pasta with Pancetta, Gruyere, Kale, and White Beans Gluten Free NaN NaN NaN NaN NaN NaN
1 Mom’s Stuffing, Lightened Up NaN NaN NaN NaN NaN NaN NaN
2 Roasted Green Beans with Caramelized Onions Dairy Free Gluten Free Vegetarian Meals Whole 30 Recipes NaN NaN NaN
3 7 Day Healthy Meal Plan (November 22-28) NaN NaN NaN NaN NaN NaN NaN
4 Makeover Spinach Gratin Gluten Free Kid Friendly Low Carb Vegetarian Meals NaN NaN NaN
5 Turkey Pot Pie with Sweet Potato Topping Gluten Free Kid Friendly NaN NaN NaN NaN NaN
6 Sautéed Shredded Brussels Sprouts with Pancetta Dairy Free Gluten Free Keto Recipes Kid Friendly Low Carb Paleo Under 30 Minutes
7 Baked Brie Phyllo Cups with Craisins and Walnuts Under 30 Minutes Vegetarian Meals NaN NaN NaN NaN NaN
8 Chicken Cassoulet with Sausage and Swiss Chard Dairy Free Freezer Meals Gluten Free NaN NaN NaN NaN
9 Drunken Style Noodles with Shrimp Dairy Free Gluten Free NaN NaN NaN NaN NaN
10 Chicken and Broccoli Noodle Casserole Kid Friendly NaN NaN NaN NaN NaN NaN
11 Arugula Salmon Salad with Capers and Shaved Parmesan Gluten Free Keto Recipes Low Carb Under 30 Minutes NaN NaN NaN
12 Roasted Acorn Squash with Brown Sugar Dairy Free Gluten Free Vegetarian Meals NaN NaN NaN NaN
13 Turkey Cutlets with Parmesan Crust Kid Friendly Under 30 Minutes NaN NaN NaN NaN NaN
14 Butternut Squash Ravioli with Sage Butter Vegetarian Meals NaN NaN NaN NaN NaN NaN
15 Air Fryer Chicken Milanese with Mediterranean Salad Air Fryer Gluten Free Under 30 Minutes NaN NaN NaN NaN
16 Salisbury Steak with Mushroom Gravy Dairy Free Freezer Meals Kid Friendly Low Carb Under 30 Minutes NaN NaN
17 Huevos Rancheros Gluten Free Under 30 Minutes Vegetarian Meals NaN NaN NaN NaN
18 Easy Black Bean Vegetarian Chili with Spiced Yogurt Gluten Free Kid Friendly Under 30 Minutes Vegetarian Meals NaN NaN NaN
19 Apple Cobbler Vegetarian Meals NaN NaN NaN NaN NaN NaN
20 Tofu Stir Fry with Vegetables in a Soy Sesame Sauce Dairy Free Gluten Free Under 30 Minutes Vegetarian Meals NaN NaN NaN
21 Autumn Apple and Grape Medley (Fruit Salad) Gluten Free Kid Friendly Under 30 Minutes Vegetarian Meals NaN NaN NaN
22 Chicken Cutlet Caprese Salad Gluten Free Meal Prep Recipes NaN NaN NaN NaN NaN
23 Beef Stew with Pumpkin Dairy Free Freezer Meals Kid Friendly Pressure Cooker Recipes Slow Cooker Recipes NaN NaN
24 Pumpkin Cream Cheese Muffins Freezer Meals Kid Friendly Vegetarian Meals NaN NaN NaN NaN
25 Pumpkin Pie Overnight Oats Dairy Free Gluten Free Kid Friendly Vegetarian Meals NaN NaN NaN
26 Strawberry Cheesecake Dip Gluten Free Kid Friendly Under 30 Minutes NaN NaN NaN NaN