I am trying to transfer the information in this code to the excel sheet, but it only transfers the latest data, as far as I understand, it overwrites the information.
How do you think I should go about overcoming this problem? What are your suggestions?
from bs4 import BeautifulSoup
import requests
import pandas as pd
source = requests.get('url').text
soup = BeautifulSoup(source, 'lxml')
jobs = soup.find_all('div', class_='prd')
for job in jobs:
product_name = job.find('a', class_='prd-link')['title']
product_id = job.find('button', class_='prd-favorite btn-add-favorites')['data-product-id']
product_url = job.find('a', class_='prd-link')['href']
product_price = job.find('span', class_='prc prc-last').text
df =pd.DataFrame({
'Col A': [product_name],
'Col B': [product_id],
'Col C': [product_url],
'Col D': [product_price],
})
df.to_excel('test.xlsx')
CodePudding user response:
Store your data in a list
of dicts
and create your DataFrame
based on these:
...
data = []
for job in jobs:
data.append({
'product_name' : job.find('a', class_='prd-link')['title'],
'product_id' : job.find('button', class_='prd-favorite btn-add-favorites')['data-product-id'],
'product_url' : job.find('a', class_='prd-link')['href'],
'product_price' : job.find('span', class_='prc prc-last').text,
})
pd.DataFrame(data).to_excel('test.xlsx')
...