Home > database >  web scrapping issue for item availability
web scrapping issue for item availability

Time:08-09

I'm new to web scrapping and was trying to learn by starting small project of tracking availablity of an item from online store. Website itself So far I have just was able to get class part of what I need to get, but not quite sure how to get the text from it for item Name, Availability and Price. I'd like your advice on what is the least effort solution to get this information from it. Or should I just treat it like a string and slice until I get my desired three values?

my code

import requests from bs4 
import BeautifulSoup



productURL = "https://allbest.kz/p98634475-oneplus-9rt-12256gb.html"
result = requests.get(productURL)
doc = BeautifulSoup(result.text, "html.parser")
classname = doc.select("[class~=b-sticky-panel__holder]")
print(classname)

Output:

[<div ><span  data-qaid="sticky_product_name">OnePlus 9RT 12/256Gb 5G Black</span><span  data-qaid="product_status_sticky_panel" title="Нет в наличии">Нет в наличии</span><div  data-qaid="product_price_sticky_panel"><span >213 400 <span >Тг.</span></span></div></div>]

FYI: Here in the output "Нет в наличии" basically means "Out of stock".

Desired result:

Name = "OnePlus 9RT 12/256Gb 5G Black"
Availability = "Out of stock"
Price = 213 400

CodePudding user response:

select returns a list of elements based on your selector, so you had a list with one element, and such you index the first element as you only have one. followed by selecting again to find the children tags that contain the information you need. Something like this. You can ignore looking for the panel holder all together and just go directly for the tags that contain the information you need.

productURL = "https://allbest.kz/p98634475-oneplus-9rt-12256gb.html"
result = requests.get(productURL)
doc = BeautifulSoup(result.text, "html.parser")

name = doc.select("[class~=b-sticky-panel__product-name]")[0].text
availability = doc.select("[class~=b-sticky-panel__product-status]")[0].text
price = doc.select("[class~=b-sticky-panel__price]")[0].text

print(f"Name = {name}")
print(f"Availability = {availability}")
print(f"Price = {price}")

Output:

Name = OnePlus 9RT 12/256Gb 5G Black
Availability = Нет в наличии
Price = 213 400 Тг.

CodePudding user response:

The following code snippet will get you the info you're looking for:

import requests
from bs4 import BeautifulSoup

headers= {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}

url = 'https://allbest.kz/p98634475-oneplus-9rt-12256gb.html'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')

availability = soup.select_one('li[data-qaid="presence_data"]').get_text(strip=True)
name = soup.select_one('span[data-qaid="product_name"]').get_text(strip=True)
price = soup.select_one('span[data-qaid="product_price"]').get_text(strip=True)
print('Name =', name)
print('Availability =', availability)
print('Price =', price)

This will print out in terminal:

Name = OnePlus 9RT 12/256Gb 5G Black
Availability = Нет в наличии
Price = 213 400

BeautifulSoup docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#

Requests docs: https://requests.readthedocs.io/en/latest/

  • Related