I have the following code for an item of a list of products:
<div >
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div >
<span > 999,00 Lei </span>
<span >-12%</span>
<span >mai ieftin cu 120,00 lei</span>
<span >879,00 Lei</span>
<span >evoCREDIT</span></div>
</div>
</div>
Some products got the price_discount span,while others dont
<span >-12%</span>
I use the following code to scrape the names of products:
texts = []
for a in soup.select("div.npi_name a[href]"):
if a.span:
text = a.span.next_sibling
else:
text = a.string
texts.append(text.strip())
I don't know what conditions do I need to get the names of the products with discounts.
Note:It has to work for a list
CodePudding user response:
A way to process the data could be to select all items with discounts:
soup.select('div.nice_product_item:has(.price_discount):has(a[href])')
Iterate over ResultSet
, pick information you need and store it in a structured way like list of dicts to process it later e.g. DataFrame
and save to csv, json, ...
Example
from bs4 import BeautifulSoup
import pandas as pd
html = '''
<div >
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div >
<span > 999,00 Lei </span>
<span >-12%</span>
<span >mai ieftin cu 120,00 lei</span>
<span >879,00 Lei</span>
<span >evoCREDIT</span></div>
</div>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('div.nice_product_item:has(.price_discount):has(a[href])'):
data.append({
'url' : e.a['href'],
'label' :s[-1] if (s := list(e.a.stripped_strings)) else None,
'price' : s.text if (s := e.select_one('span.real_price')) else None,
'discount' : s.text if (s := e.select_one('span.price_discount')) else None,
'other' : 'edit for elements you need'
})
pd.DataFrame(data)
Output
url | label | price | discount | other |
---|---|---|---|---|
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html | Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru) | 879,00 Lei | -12% | edit for elements you need |