I try to take price from website, but there is a problem, because there are products with and without discounts and prices for these discount and non discount are located in different "enter image description here span". My code take only first price in website (btw website https://ua.iherb.com/new-products?p=1)
this is part of code
def get_page_data(html): soup = BeautifulSoup(html, 'lxml')
lis = soup.find_all('div', class_="product-cell-container col-xs-12 col-sm-12 col-md-8 col-lg-6")
for li in lis:
try:
name = li.find('div', class_='absolute-link-wrapper').find('a').get('title')
except:
name = ''
try:
url = li.find('div', class_='absolute-link-wrapper').find('a').get('href')
except:
url = ''
prices = li.find_all('div', attrs={"class":"product-price-top"})
main_list=[]
for price in prices:
try:
discount_price = price.find("span", class_='price discount-green').text.strip()
main_list.append(discount_price)
except AttributeError:
original_price = price.find("span", class_='price ').text.strip()
main_list.append(original_price)
print(main_list)
CodePudding user response:
You can scrape the AJAX API accessed by Javascript to bring the information into page. Here is one way to do it:
import requests
import pandas as pd
from tqdm import tqdm
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# pd.set_option('display.max_rows', None)
s = requests.Session()
big_df = pd.DataFrame()
for x in tqdm(range(1, 400, 50)):
df = pd.json_normalize(s.get(f'https://ua.iherb.com/catalog/iherblive?isAjax=true&index={x}&nop=50&_=1665000595915').json())
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
big_df.drop('Product.RatingStarsMap', axis=1, inplace=True)
big_df.to_csv('ukr_herbs.csv')
print(big_df)
Result in terminal:
100%
8/8 [00:04<00:00, 1.69it/s]
Country CountryCode Index Product.Id Product.Name Product.ProductUrl Product.ProductImage Product.ProductImageRetina Product.PartNumber Product.ListPrice Product.DiscountPrice Product.OutOfStock Product.StockStatus Product.HidePrice Product.Discontinued Product.NotAvailable Product.ShowDiscount Product.DiscountType Product.IsInCartDiscount Product.Rating Product.RatingCount Product.RatingText Product.RatingURL Product.ReviewURL Product.IsShippingSaver Product.IsFeaturedBrand Product.IsAutoship Product.ShowGroupMessage Product.IsProductCompared Product.IsSeasonallyUnavailable Product.DiscountPercentage Product.PrimaryImageIndex Product.CartProductsInfo.lineItems Product.HasRating Product.HasTag Product.SalesDiscountPercentage Product.IsAutoApplyPromo Product.IsNew Product.IsTrial Product.SpecialDealInfo.PercentageClaimed Product.SpecialDealInfo.IsCompletelyClaimed Product.SpecialDealInfo.CountPerCustomer
0 Mexico mx 1 18492 Thorne Research, Bio-Gest, 180 Capsules https://www.iherb.com/pr/thorne-research-bio-gest-180-capsules/18492 https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/k/13.jpg https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/r/13.jpg THR-40502 $44.00 $44.00 False 0 False False False False 7 False 4.8 1284 4.8 of 5 based on 1284 https://www.iherb.com/pr/thorne-research-bio-gest-180-capsules/18492#product-detail-reviews https://www.iherb.com/r/thorne-research-bio-gest-180-capsules/18492 False False False False False False 0 13 [{'productId': 18492, 'productName': 'Thorne Research, Bio-Gest, 180 Capsules', 'iURLSmall': 'https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/s/13.jpg', 'iURLMedium': 'https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/m/13.jpg', 'listPrice': ' 44.00′,′