Home > OS >  Extract Information from BeautifulSoup bs4.element.Tag using Python
Extract Information from BeautifulSoup bs4.element.Tag using Python

Time:08-22

I'm practicing to extract some information via web scraping from website https://www.kerastase.com.au/ . As an example, I'm focusing on Best Seller items (7 items). I have been able to extract name, description and price using the following code.

import requests
from bs4 import BeautifulSoup

url='https://www.kerastase.com.au/'
response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

prod_names = soup.find_all("h3", class_="c-product-tile__name")
prod_names = [prod.get_text() for prod in prod_names]
prices = soup.find_all("span", class_="c-product-price__value")
prices = [float(price.get_text()[2:]) for price in prices if (len(price) > 0)]
prod_descs = soup.find_all("p", class_="c-product-tile__description")
prod_descs = [desc.get_text() for desc in prod_descs]

However, extracting rating and number of reviews seem to be more complicated. It is a nested div. I have been able to extract caption of the first item using the following command; however it is a mess, and don't know what to do after this step:

soup.findAll('figcaption', class_="c-product-tile__caption")[0]

Here is an example of full caption of one item I get:

<figcaption > <div > <div > <button aria-label="Add to Wishlist Elixir Ultime Pride Edition Hair Oil" aria-pressed=""  data-analytics='{"products":[{"pid":"3474637116088","title":"Elixir Ultime Pride Edition Hair Oil","description":"","url":"https://www.kerastase.com.au/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/3474637116088.html","imgUrl":"https://www.kerastase.com.au/on/demandware.static/-/Sites-kerastase-master-catalog/default/dw377882d1/2022/Elixir Ultime/Pride/1. Product.jpg","currency":"AUD","price":65,"name":"Elixir Ultime Pride Edition Hair Oil","subname":"Iconic nourishing hair oil for all hair types. Kérastase will be donating to Minus18, subsidising LGBTQIA  Inclusion Workshops for schools across Australia.","id":"elixir-pride","salePrice":65,"brand":"Kérastase","category":"others/collections/elixir ultime","productTopCategory":"products","variant":"100 ml","size":"100 ml","color":"","fragrance":"","stock":"in stock","autoReplenishmentInterval":"not present","upc":"3474637116088","regularPrice":null,"isProductSet":false,"isProductGroup":false,"isBundle":false,"bundleID":"","rating":5,"numberReviews":2,"vtoState":"not present","collection":["Elixir Ultime"],"customizations":{"engraving":"not present"},"badges":"none","remainingStock":null}],"label":"elixir ultime pride edition hair oil::3474637116088","category":"{{dataLayer.page.category}}"}' data-component="product/AddToWishlist" data-component-options='{"pid":"3474637116088","url":{"add":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/Wishlist-AddToWishList","remove":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/Wishlist-RemoveFromWishList"},"text":{"title":{"add":"Add to Wishlist","remove":"Remove from Wishlist"},"accessibility":{"addAriaLabel":"Add to Wishlist Elixir Ultime Pride Edition Hair Oil","removeAriaLabel":"Remove from Wishlist Elixir Ultime Pride Edition Hair Oil"}},"isLabel":false}' title="Add to Wishlist"> <span  data-js-wishlist-text="">Wishlist</span> </button> </div> <h3 ><a data-js-product-name="" data-lora-datalayer='{"products":{"3474637116088":{"name":"Elixir Ultime Pride Edition Hair Oil"}}}' href="/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/elixir-pride.html"> Elixir Ultime Pride Edition Hair Oil </a></h3><p > Iconic nourishing hair oil for all hair types. Kérastase will be donating to Minus18, subsidising LGBTQIA  Inclusion Workshops for schools across Australia. </p> <div > <div > <div data-bv-productid="elixir-pride" data-bv-redirect-url="/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/elixir-pride.html" data-bv-seo="false" data-bv-show="inline_rating" data-component="product/BazaarvoiceService"> </div> </div> <div > <div  data-component="product/ProductPrice" data-component-options='{"pid":"3474637116088","reloadData":{"configid":null},"dataModelId":"productprice"}'> <span  data-js-pricelabel="">Old price</span> <span  data-js-standardprice=""></span> <span  data-js-pricelabel="">New price</span> <span  data-js-saleprice="">A$65.00</span> </div> </div> </div> <div > <div > </div> <div > <div >One size available</div> <div > <span data-js-pid="">100 ml</span> </div> </div> </div> </div> <div  data-js-producttile-actions=""> <div data-component="global/ComponentPlaceholder" data-component-options='{"_lazyload":true,"reloadData":{"id":"productmainaction","section":"product","configid":"producttile","reloadUrl":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/CDSLazyload-product_productmainaction?configid=producttile&amp;data=3474637116088&amp;id=productmainaction&amp;pageId=homepage&amp;section=product"}}'> <button > <span>Loading ...</span> </button> </div> </div> </figcaption>

How can I get products rating and number of reviews from this? Example: "rating":5,"numberReviews":2

(It is probably possible to get all product info from the above, but don't know what the best method is).

CodePudding user response:

If you find main specific tag for product details data is inside in button tag and it contains json formatted data so we can use data and find the relaticve information

main_tag=soup.find_all("div",class_="c-product-tile__figure")
import json
dict1={}
for i in range(len(main_tag)):
    json_data=main_tag[i].find("button")['data-analytics']    
    details=json.loads(json_data)
    price=details['products'][0]['price']
    rating=details['products'][0]['rating']
    numberReviews=details['products'][0]['numberReviews']
    title=details['products'][0]['title']
    dict1[i]={'name':title,'price':price,'rating':rating,'reviews':numberReviews}

Output:

{0: {'name': 'Elixir Ultime Pride Edition Hair Oil',
  'price': 65,
  'rating': 5,
  'reviews': 2},
 1: {'name': 'Nutritive 8HR Magic Night Hair Serum',
  'price': 67,
  'rating': 4.5701,
  'reviews': 749},
  ....
  }
  • Related