Home > database >  bs4: skipping AttributeError in for loop
bs4: skipping AttributeError in for loop

Time:05-26

I'm web scraping for the first time and ran into a problem. I have to get the product price certain products (the url in the code), however, when a there is a discount on a product, it will give an error. This is the code that I have right now (deleted a couple lines that were in-between, but it works like this):

import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd

links = []


url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[] a'):
        abs_url = 'https://www.ah.nl'   link.get('href')
        #print(abs_url)
   
        #GETTING THE PRICE
        p_price = []
        req4 = requests.get(abs_url)
        soup = BeautifulSoup(req4.content, 'html.parser')
        p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text

And the output is this:

runcell(0, '/Users/eva/Desktop/MDDD/TestingProduct.py')
0.55
0.35
2.99
0.65
Traceback (most recent call last):

  File "/Applications/Spyder.app/Contents/Resources/lib/python3.9/spyder_kernels/py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "/Users/eva/Desktop/MDDD/TestingProduct.py", line 22, in <module>
    p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text

AttributeError: 'NoneType' object has no attribute 'text'

So it gives me the first four prices and then the error. I tried to work around it adding this:

 if p_price != AttributeError:continue
  

But that didn't work. I don't mind if the products that have discount aren't in the dataset. Any tips on how to keep the for loop going - so deleting the prices that give the error?

Thank you!

CodePudding user response:

You can get the price from the results pages without visiting each individual link. Additionally, by judicious use of css :not() pseudo-class selector you can exclude the old prices, where discounts appear, this then removes the error:

import requests
from bs4 import BeautifulSoup

url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'

for page in range(1, 3):
    print(f"Page: {page}")
    print()
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for product in soup.select('[data-testhook="product-card"]'):
        print(product.select_one('[data-testhook="product-title"]').get_text(strip=True))
        print(product.select_one('[data-testhook="price-amount"]:not([class*=price_was])').text)
    print()

CodePudding user response:

You are getting NoneType error because all items didn't containt price and to get rid of this error, you can use if else None statement

import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd

links = []


url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[] a'):
        abs_url = 'https://www.ah.nl'   link.get('href')
        #print(abs_url)
   
        #GETTING THE PRICE
        p_price = []
        req4 = requests.get(abs_url)
        soup = BeautifulSoup(req4.content, 'html.parser')
        p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u'))
        p_price = p_price.text if p_price else None
        print(p_price)

Output:

0.55
0.35
2.99
0.65
None
0.49
0.65
0.59
0.92
0.99
0.79
0.55
0.65
2.19
3.00
2.00
0.89
0.89
0.66
0.89
0.89
1.25
1.99
1.19
0.99
0.79
1.79
1.99
1.79
6.29
1.19
1.39
2.19
0.65
1.95
0.79
None
None
None
None
2.00
2.29
0.49
1.29
1.55
1.59
1.39
2.99
2.00
0.99
1.39
1.65
1.19
0.99
0.99
2.29
1.99
2.69
0.49
0.99
0.79
2.19
2.00
3.69
0.89
2.29
0.45
1.85
2.00
2.00
5.99
1.09
2.79
1.19
3.29
0.95
  • Related