Home > database >  Conditioning the soup selection on a web scrape.Python/BeautifulSoup
Conditioning the soup selection on a web scrape.Python/BeautifulSoup

Time:03-22

I have the following code for an item of a list of products:

    <div >
    <div >
       <h2>
           <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
           <span style="color:red">Stoc limitat!</span>  
           Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
       </h2>
    </div>

    <div >
        <span >&nbsp;999,00 Lei&nbsp;</span>
        <span >-12%</span>
        <span >mai ieftin cu 120,00 lei</span>
        <span >879,00 Lei</span>
        <span >evoCREDIT</span></div>
    </div>
</div>

Some products got the price_discount span,while others dont

<span >-12%</span>

I use the following code to scrape the names of products:

texts = []

for a in soup.select("div.npi_name a[href]"):
    if a.span:
        text = a.span.next_sibling
    else:
        text = a.string
    texts.append(text.strip())

I don't know what conditions do I need to get the names of the products with discounts.

Note:It has to work for a list

CodePudding user response:

A way to process the data could be to select all items with discounts:

soup.select('div.nice_product_item:has(.price_discount):has(a[href])')

Iterate over ResultSet, pick information you need and store it in a structured way like list of dicts to process it later e.g. DataFrame and save to csv, json, ...

Example

from bs4 import BeautifulSoup
import pandas as pd

html = '''
<div >
    <div >
       <h2>
           <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
           <span style="color:red">Stoc limitat!</span>  
           Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
       </h2>
    </div>

    <div >
        <span >&nbsp;999,00 Lei&nbsp;</span>
        <span >-12%</span>
        <span >mai ieftin cu 120,00 lei</span>
        <span >879,00 Lei</span>
        <span >evoCREDIT</span></div>
    </div>
</div>
'''

soup = BeautifulSoup(html)

data = []

for e in soup.select('div.nice_product_item:has(.price_discount):has(a[href])'):
    data.append({
        'url' : e.a['href'],
        'label' :s[-1] if (s := list(e.a.stripped_strings)) else None,
        'price' : s.text if (s := e.select_one('span.real_price')) else None,
        'discount' : s.text if (s := e.select_one('span.price_discount')) else None,
        'other' : 'edit for elements you need'
    })
pd.DataFrame(data)

Output

url label price discount other
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru) 879,00 Lei -12% edit for elements you need
  • Related