I have two lists: SoapName and SoapPrices as follows each containing 16 elements:
SoapName
[{'title': 'Beer and Honey Shampoo bar'},
{'title': 'Cedarwood Shaving Soap'},
{'title': 'Chamomile and Lavender Shampoo and Body Bar'}...]
SoapPrices
[{'price': 6.0},
{'price': 5.0},
{'price': 5.0}...]
I would like to create a table in the following format so that I can insert it into a pandas
dataframe table:
{'title': 'Beer and Honey Shampoo bar', 'price': 6.0}
{'title': 'Cedarwood Shaving Soap', 'price': 5.0}
{'title': 'Chamomile and Lavender Shampoo and Body Bar', 'price': 5.0}....
So far I have gone back to first principles using BeautifulSoup
to web scrape the data and tried to do a nested loop:
all_soap = []
soapinformation = soup.find_all("h1", class_= "product_title entry-title elementor-heading-title elementor-size-default")
for soap in soapinformation:
soapTitle = soap.find("a").text
soapInfo = {"title" : soapTitle, "price" : price3[0-17]}
print(soapInfo)
the output being:
{'title': 'Beer and Honey Shampoo bar', 'price': '6.00'}
{'title': 'Cedarwood Shaving Soap', 'price': '6.00'}
{'title': 'Chamomile and Lavender Shampoo and Body Bar', 'price': '6.00'}....
I'm not sure how to iterate the data for price.
CodePudding user response:
zip
names and prices and then create a new dict for each pair of name and price:
[{**name, **price} for name, price in zip(names, prices)]
#[{'title': 'Beer and Honey Shampoo bar', 'price': 6.0},
# {'title': 'Cedarwood Shaving Soap', 'price': 5.0},
# {'title': 'Chamomile and Lavender Shampoo and Body Bar', 'price': 5.0}]
CodePudding user response:
Just in addition to answer of @Psidom - Try to avoid working with a bunch of lists, cause you can control its length only with difficulty, if there are elements not available.
Change your strategy collecting your data, you could get the infos in one go.
for e in soup.select('article'):
data.append({
'title':e.h1.text,
'price':e.bdi.span.next_sibling,
'url':e.a.get('href')
})
Example
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://ratherlather.co.uk/shop/'
req = requests.get(url)
soup = BeautifulSoup(req.content)
data = []
for e in soup.select('article'):
data.append({
'title':e.h1.text,
'price':e.bdi.span.next_sibling,
'url':e.a.get('href')
})
pd.DataFrame(data)
Output
...