I'm a beginner and I'm learning BeautifulSoup and want to get information from a website, but the output doesn't write any information, I don't know where I'm doing wrong, gosh, help me
this is my code
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("C:\\chromedriver\\chromedriver.exe")
products=[] #store name of the product
prices=[] #store price of the product
ratings=[] #store rating of the product
driver.get("http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html")
content = driver.page_source
soup = BeautifulSoup(content,features="html.parser")
for a in soup.findAll('a',href=True, attrs={'container-fluid page'}):
name=a.find('div', attrs={'class':'col-sm-6 product_main'})
price=a.find('div', attrs={'class':'col-sm-6 product_main'})
rating=a.find('div', attrs={'class':'star-rating Three'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
df = pd.DataFrame({'Product Name': products, 'Price': prices, 'Rating': ratings})
df.to_csv('D:\\products.csv', index=False, encoding='utf-8')
it doesn't report any errors, I just get a csv file with no information.
Product Name,Price,Rating
CodePudding user response:
Note There are a few things in your code and I would recommened to keep it simple. Your strategy should be to select by id
, tag
, class
- This order goes from static to more dynamic provided information. In new code use find_all()
instead of old syntax findAll()
Main issue is that your selection soup.findAll('a',href=True, attrs={'container-fluid page'})
wont find anything, so the result is empty. In fact, that there is only one product at this page, it do not need all these lists.
...
soup = BeautifulSoup(content,"html.parser")
df = pd.DataFrame([{
'Product Name': soup.h1.text,
'Price': soup.find('p',{"class": "price_color"}).text,
'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
...
Example
There is no need to use selenium
, take also a short look at requests
- Process from cooking your soup
is almost the same:
import requests
from bs4 import BeautifulSoup
URL = 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html'
content = requests.get(URL).content
soup = BeautifulSoup(content,"html.parser")
df = pd.DataFrame([{
'Product Name': soup.h1.text,
'Price': soup.find('p',{"class": "price_color"}).text,
'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
df
#or to save as csv -> df.to_csv('D:\\products.csv', index=False, encoding='utf-8')
Output
Product Name | Price | Rating |
---|---|---|
A Light in the Attic | £51.77 | three |