i coded a parser, this one :
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('nikeshoes.csv', 'w', newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['name', 'price', 'image_scr'])
for pair in soup.find_all('div', class_='product-card__body'):
name = pair.a.text
print(name)
price = soup.find('div', class_='product-price css-11s12ax is--current-price').text
print(price)
try:
image_scr = pair.select_one('img.css-1fxh5tw.product-card__hero-image')['src']
except Exception as e:
image_scr = None
print(image_scr)
print()
csv_writer.writerow([name, price, image_scr])
csv_file.close()
The names and images links works fine and i get all of them but the price
variable seems to be stuck on the first price of the web page, as shown bellow
It is weird because when i look through the website the price change in the html path i use. Does someone would know why the price is stuck on the first item ?
CodePudding user response:
You should try this:
for x in soup.find_all('div', class_='product-price css-11s12ax is--current-price'):
print(x.text)
CodePudding user response:
What happens?
At this point and until the end of the list the price cannot be found also it is the same path.
Not really, there is a sale price and a regular price and structure of elements and classes is sligthly different -> Your selection wont work.
How to fix?
Select your elements more specific for example by attribute
:
regular_price = pair.select_one('div[data-test="product-price"]').text.replace(u'\xa0', u' ') if pair.select('div[data-test="product-price"]') else None
You should also check, if element is available or not.
Example
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').text
soup = BeautifulSoup(source, 'lxml')
data = []
for pair in soup.find_all('div', class_='product-card__body'):
name = pair.a.text.replace(u'\xa0', u' ')
regular_price = pair.select_one('div[data-test="product-price"]').text.replace(u'\xa0', u' ') if pair.select('div[data-test="product-price"]') else None
sale_price= pair.select_one('div[data-test="product-price-reduced"]').text.replace(u'\xa0', u' ') if pair.select('div[data-test="product-price-reduced"]') else None
try:
image_src = pair.select_one('img.css-1fxh5tw.product-card__hero-image')['src']
except Exception as e:
image_src = None
data.append({
'name':name,
'regular_price':regular_price,
'sale_price':sale_price,
'image_src':image_src
})
data