Im trying to get the prices, serves, pieces and weight of the products from the following site by using regex in specific tags and classes from the html, while i successfully got the prices using the regex, coudn't get the weight, serves and pieces of the product using the same statement, can i use the css delector to get the remaining ?
https://www.tendercuts.in/chicken/whole-chicken-skin-off
code :
import requests,re
from bs4 import BeautifulSoup
URL = 'https://www.tendercuts.in/chicken/whole-chicken-skin-off'
r = requests.get(URL)
soup = BeautifulSoup(r.text)
data = []
for item in soup.select('app-child-product-display'):
data.append({
'price':re.search(r'₹\d*',item.find('p', class_='current-price').text).group(),
'weight':re.search(r'\d*',item.find('span', class_='callout').text).group()
})
print(data)
Output :
[{'price': '₹99', 'weight': ''}, {'price': '₹125', 'weight': ''}, {'price': '₹499', 'weight': ''}, {'price': '₹749', 'weight': ''}]
CodePudding user response:
It do not need regex
to extract the information, just select your targets more specific with CSS SELECTORS by id
:
'weight':item.select_one('#product-weight span:last-child').text,
'pieces':item.select_one('#product-pieces span:last-child').text
Example
import requests,re
from bs4 import BeautifulSoup
URL = 'https://www.tendercuts.in/chicken/whole-chicken-skin-off'
r = requests.get(URL)
soup = BeautifulSoup(r.text)
data = []
for item in soup.select('app-child-product-display'):
data.append({
'price':re.search(r'₹\d*',item.find('p', class_='current-price').text).group(),
'weight':item.select_one('#product-weight span:last-child').text,
'pieces':item.select_one('#product-pieces span:last-child').text
})
print(data)
Output
[{'price': '₹99', 'weight': '480 - 500 Gms', 'pieces': '18 to 20'}, {'price': '₹125', 'weight': '480 - 500 Gms', 'pieces': '18 to 20'}, {'price': '₹499', 'weight': '1980 - 2000 Gms', 'pieces': '50 to 60'}, {'price': '₹749', 'weight': '2980 - 3000 Gms', 'pieces': '80 to 90'}]