Home > Net >  I need to scrap different types of prices from website using beautifulsoup
I need to scrap different types of prices from website using beautifulsoup

Time:10-11

I try to take price from website, but there is a problem, because there are products with and without discounts and prices for these discount and non discount are located in different "enter image description here span". My code take only first price in website (btw website https://ua.iherb.com/new-products?p=1)

this is part of code

def get_page_data(html): soup = BeautifulSoup(html, 'lxml')

lis = soup.find_all('div', class_="product-cell-container col-xs-12 col-sm-12 col-md-8 col-lg-6")

for li in lis:
    try:
        name = li.find('div', class_='absolute-link-wrapper').find('a').get('title')
    except:
        name = ''

    try:
        url = li.find('div', class_='absolute-link-wrapper').find('a').get('href')
    except:
        url = ''

    prices = li.find_all('div', attrs={"class":"product-price-top"})
    main_list=[]
    for price in prices:
        try:
            discount_price = price.find("span", class_='price discount-green').text.strip()
            main_list.append(discount_price)

        except AttributeError:

            original_price = price.find("span", class_='price ').text.strip()
            main_list.append(original_price)

    print(main_list)

CodePudding user response:

You can scrape the AJAX API accessed by Javascript to bring the information into page. Here is one way to do it:

import requests
import pandas as pd
from tqdm import tqdm

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# pd.set_option('display.max_rows', None)

s = requests.Session()
big_df = pd.DataFrame()
for x in tqdm(range(1, 400, 50)):
    df = pd.json_normalize(s.get(f'https://ua.iherb.com/catalog/iherblive?isAjax=true&index={x}&nop=50&_=1665000595915').json())
    big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
big_df.drop('Product.RatingStarsMap', axis=1, inplace=True)
big_df.to_csv('ukr_herbs.csv')
print(big_df)

Result in terminal:

100%
8/8 [00:04<00:00, 1.69it/s]
Country CountryCode Index   Product.Id  Product.Name    Product.ProductUrl  Product.ProductImage    Product.ProductImageRetina  Product.PartNumber  Product.ListPrice   Product.DiscountPrice   Product.OutOfStock  Product.StockStatus Product.HidePrice   Product.Discontinued    Product.NotAvailable    Product.ShowDiscount    Product.DiscountType    Product.IsInCartDiscount    Product.Rating  Product.RatingCount Product.RatingText  Product.RatingURL   Product.ReviewURL   Product.IsShippingSaver Product.IsFeaturedBrand Product.IsAutoship  Product.ShowGroupMessage    Product.IsProductCompared   Product.IsSeasonallyUnavailable Product.DiscountPercentage  Product.PrimaryImageIndex   Product.CartProductsInfo.lineItems  Product.HasRating   Product.HasTag  Product.SalesDiscountPercentage Product.IsAutoApplyPromo    Product.IsNew   Product.IsTrial Product.SpecialDealInfo.PercentageClaimed   Product.SpecialDealInfo.IsCompletelyClaimed Product.SpecialDealInfo.CountPerCustomer
0   Mexico  mx  1   18492   Thorne Research, Bio-Gest, 180 Capsules https://www.iherb.com/pr/thorne-research-bio-gest-180-capsules/18492    https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/k/13.jpg https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/r/13.jpg THR-40502   $44.00  $44.00  False   0   False   False   False   False   7   False   4.8 1284    4.8 of 5 based on 1284  https://www.iherb.com/pr/thorne-research-bio-gest-180-capsules/18492#product-detail-reviews https://www.iherb.com/r/thorne-research-bio-gest-180-capsules/18492 False   False   False   False   False   False   0   13  [{'productId': 18492, 'productName': 'Thorne Research, Bio-Gest, 180 Capsules', 'iURLSmall': 'https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/s/13.jpg', 'iURLMedium': 'https://cloudinary.images-iherb.com/image/upload/f_auto,q_auto:eco/images/thr/thr40502/m/13.jpg', 'listPrice': ' 44.00′,′           
  • Related