I'm web scraping for the first time and ran into a problem. I have to get the product price certain products (the url in the code), however, when a there is a discount on a product, it will give an error. This is the code that I have right now (deleted a couple lines that were in-between, but it works like this):
import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd
links = []
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
req = requests.get(url.format(page=page))
soup = BeautifulSoup(req.content, 'html.parser')
for link in soup.select('div[] a'):
abs_url = 'https://www.ah.nl' link.get('href')
#print(abs_url)
#GETTING THE PRICE
p_price = []
req4 = requests.get(abs_url)
soup = BeautifulSoup(req4.content, 'html.parser')
p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text
And the output is this:
runcell(0, '/Users/eva/Desktop/MDDD/TestingProduct.py')
0.55
0.35
2.99
0.65
Traceback (most recent call last):
File "/Applications/Spyder.app/Contents/Resources/lib/python3.9/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/Users/eva/Desktop/MDDD/TestingProduct.py", line 22, in <module>
p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text
AttributeError: 'NoneType' object has no attribute 'text'
So it gives me the first four prices and then the error. I tried to work around it adding this:
if p_price != AttributeError:continue
But that didn't work. I don't mind if the products that have discount aren't in the dataset. Any tips on how to keep the for loop going - so deleting the prices that give the error?
Thank you!
CodePudding user response:
You can get the price from the results pages without visiting each individual link. Additionally, by judicious use of css :not() pseudo-class selector you can exclude the old prices, where discounts appear, this then removes the error:
import requests
from bs4 import BeautifulSoup
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1, 3):
print(f"Page: {page}")
print()
req = requests.get(url.format(page=page))
soup = BeautifulSoup(req.content, 'html.parser')
for product in soup.select('[data-testhook="product-card"]'):
print(product.select_one('[data-testhook="product-title"]').get_text(strip=True))
print(product.select_one('[data-testhook="price-amount"]:not([class*=price_was])').text)
print()
CodePudding user response:
You are getting NoneType error because all items didn't containt price and to get rid of this error, you can use if else None statement
import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd
links = []
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
req = requests.get(url.format(page=page))
soup = BeautifulSoup(req.content, 'html.parser')
for link in soup.select('div[] a'):
abs_url = 'https://www.ah.nl' link.get('href')
#print(abs_url)
#GETTING THE PRICE
p_price = []
req4 = requests.get(abs_url)
soup = BeautifulSoup(req4.content, 'html.parser')
p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u'))
p_price = p_price.text if p_price else None
print(p_price)
Output:
0.55
0.35
2.99
0.65
None
0.49
0.65
0.59
0.92
0.99
0.79
0.55
0.65
2.19
3.00
2.00
0.89
0.89
0.66
0.89
0.89
1.25
1.99
1.19
0.99
0.79
1.79
1.99
1.79
6.29
1.19
1.39
2.19
0.65
1.95
0.79
None
None
None
None
2.00
2.29
0.49
1.29
1.55
1.59
1.39
2.99
2.00
0.99
1.39
1.65
1.19
0.99
0.99
2.29
1.99
2.69
0.49
0.99
0.79
2.19
2.00
3.69
0.89
2.29
0.45
1.85
2.00
2.00
5.99
1.09
2.79
1.19
3.29
0.95