Home > database >  python BeautifulSoup web Scraping output no information written
python BeautifulSoup web Scraping output no information written

Time:03-22

I'm a beginner and I'm learning BeautifulSoup and want to get information from a website, but the output doesn't write any information, I don't know where I'm doing wrong, gosh, help me

this is my code

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("C:\\chromedriver\\chromedriver.exe")
products=[] #store name of the product
prices=[] #store price of the product
ratings=[] #store rating of the product
driver.get("http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html")
content = driver.page_source
soup = BeautifulSoup(content,features="html.parser")
for a in soup.findAll('a',href=True, attrs={'container-fluid page'}):
    name=a.find('div', attrs={'class':'col-sm-6 product_main'})
    price=a.find('div', attrs={'class':'col-sm-6 product_main'})
    rating=a.find('div', attrs={'class':'star-rating Three'})
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text)
df = pd.DataFrame({'Product Name': products, 'Price': prices, 'Rating': ratings})
df.to_csv('D:\\products.csv', index=False, encoding='utf-8')

it doesn't report any errors, I just get a csv file with no information.

Product Name,Price,Rating

CodePudding user response:

Note There are a few things in your code and I would recommened to keep it simple. Your strategy should be to select by id, tag, class - This order goes from static to more dynamic provided information. In new code use find_all() instead of old syntax findAll()

Main issue is that your selection soup.findAll('a',href=True, attrs={'container-fluid page'}) wont find anything, so the result is empty. In fact, that there is only one product at this page, it do not need all these lists.

...
soup = BeautifulSoup(content,"html.parser")

df = pd.DataFrame([{
    'Product Name': soup.h1.text, 
    'Price': soup.find('p',{"class": "price_color"}).text, 
    'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
...

Example

There is no need to use selenium, take also a short look at requests - Process from cooking your soup is almost the same:

import requests
from bs4 import BeautifulSoup

URL = 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html'
content = requests.get(URL).content

soup = BeautifulSoup(content,"html.parser")

df = pd.DataFrame([{
    'Product Name': soup.h1.text, 
    'Price': soup.find('p',{"class": "price_color"}).text, 
    'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
df
#or to save as csv -> df.to_csv('D:\\products.csv', index=False, encoding='utf-8')

Output

Product Name Price Rating
A Light in the Attic £51.77 three
  • Related