Home > Back-end >  Searching nested tags with BeautifulSoup
Searching nested tags with BeautifulSoup

Time:10-27

I'm trying to parse html from this ecommerce website with selenium, beautifulsoup in python to get following text from product titles:

"Electric outlet 220V 32A IP44" and "34.60"

Problem: I get empty values for my variables productCardTitle and productPriceValue.

This is my html:

<ul class="products-cards products-cards--view-grid grid">
    <li data-uid="210a10d0-4976-4c94-9ab4-0e7d38459c02" class="products-cards__item product-card">
        <div class="product-card__top">...</div> 
        <div class="product-card__content">...</div> 
        <div class="product-card__title-wrapper">
            <h3 class="product-card__title">
                <a href="/catalog/Electric-outlet-220V-32A-IP44">
                    Electric outlet 220V 32A IP44
                </a>
            </h3>
        </div> 
        ...                             
        <div class="product-card__bottom">
            <p class="product-card__price product-price product-price--small">...</p>
            <p class="product-card__price product-price">
                <span class="visually-hidden">Price:</span> 
                <span class="product-price__value">34.60</span> 
                <span class="product-price__currency">...</span> 
        ...
    </li>
</ul>

This is my code in python:

productCards = soup.find_all('li', class_="products-cards__item product-card")
for productCard in productCards:
    productCardTitle = productCard.find_all('h3', class_="product-card__title")
    for product in productCardTitle:
        title = product.findChildren('a')[0].string.strip()
        print(title) 

    productPriceValue = productCard.find_all('span', class_="product-price__value")
    for product in productPriceValue:
        price = product.string.strip()
        print(price)

I would appreciate if someone could give me some help on how to solve this problem.

CodePudding user response:

Maybe, you want to collect parsed data:

data = []
productCards = soup.find_all('li', class_="products-cards__item product-card")
for productCard in productCards:
    productCardTitle = productCard.find_all('h3', class_="product-card__title")
    for product in productCardTitle:
        title = product.findChildren('a')[0].string.strip()

    productPriceValue = productCard.find_all('span', class_="product-price__value")
    for product in productPriceValue:
        price = product.string.strip()
        
    data.append({'title': title, 'price': float(price)})

Output:

>>> data
[{'title': 'Electric outlet 220V 32A IP44', 'price': 34.6}]

CodePudding user response:

Not sure what is going wrong, that your results are empty. Tryed it with selenium as well as with requests and both provide valide information.

Anyway - It do not need that numbers of loops to grab the information, lets take a look how to make it more lean.

Select all cards like this:

soup.select('li.product-card')

Iterate once over the result set and create a list of dicts:

[
    {'title': card.h3.get_text(strip=True), 
     'price': card.select_one('span.product-price__value').get_text()
    } 

    for card in soup.select('li.product-card')
] 

Example (selenium)

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome('YOUR PATH TO CHROME')
driver.get('https://shop-aventa.ru/search?q= Разъем 220 ')

soup=BeautifulSoup(driver.page_source, 'html.parser')

data = [
        {'title': card.h3.get_text(strip=True), 
         'price': card.select_one('span.product-price__value').get_text()
        } 
    
        for card in soup.select('li.product-card')
        ] 

data

Example (requests)

import requests
from bs4 import BeautifulSoup

headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}

r =requests.get('https://shop-aventa.ru/search?q= Разъем 220 ')
soup=BeautifulSoup(r.content, 'html.parser')

data = [
        {'title': card.h3.get_text(strip=True), 
         'price': card.select_one('span.product-price__value').get_text()
        } 
    
        for card in soup.select('li.product-card')
        ] 

data

Output

[{'title': 'Разъем 220 ЭКФ (Розетка 233), 63А, IP67', 'price': '934.60'}, {'title': 'Разъем 220 ЭКФ (Розетка 223), 32А, IP44', 'price': '286.00'}, {'title': 'Разъем 220 ЭКФ (Вилка 023), 32А, IP44', 'price': '239.40'}, {'title': 'Разъем 220 ЭКФ (Розетка 123), 32А, IP44', 'price': '342.80'}, {'title': 'Разъем 220 ЭКФ (Розетка 213), 16А, IP44', 'price': '200.70'}, {'title': 'Разъем 220 ЭКФ (Розетка 113), 16А, IP44', 'price': '269.70'}, {'title': 'Разъем 220 ЭКФ (Вилка 013), 16А, IP44', 'price': '161.50'}, {'title': 'Разъем 220 ЭКФ (Розетка 413), 16А, IP44', 'price': '269.20'}, {'title': 'Разъем 220В DKC 32А 220В, 2P E, наст. IP44', 'price': '595.50'}, {'title': 'Разъем 220 ИЭК (Вилка 513), 16А, IP44, MAGNUM', 'price': '533.58'}, {'title': 'Разъем 220 ИЭК (Вилка 033), 63А, IP67, MAGNUM', 'price': '1 727.41'}, {'title': 'Разъем 220 ИЭК (Розетка 223), 32А, IP 44', 'price': '348.21'}, {'title': 'Разъем 220 ИЭК (Розетка 113), 16А, IP44, MAGNUM', 'price': '427.15'}, {'title': 'Разъем 220 ИЭК (Розетка 133), 63А, IP67, MAGNUM', 'price': '2 607.28'}, {'title': 'Разъем 220 ИЭК (Розетка 233), 63А, IP44', 'price': '2 051.63'}, {'title': 'Разъем 220 ИЭК (Вилка 023), 32А, IP44, MAGNUM', 'price': '362.05'}, {'title': 'Разъем 220 ИЭК (Розетка 113), 16А, IP 44', 'price': '330.47'}, {'title': 'Разъем 220 ИЭК (Вилка 023), 32А, IP 44', 'price': '291.02'}, {'title': 'Разъем 220 ИЭК (Розетка 213), 16А, IP44, MAGNUM', 'price': '341.92'}, {'title': 'Разъем 220 ИЭК (Вилка 013), 16А, IP 44', 'price': '200.54'}, {'title': 'Разъем 220 ИЭК (Вилка 033), 63А, IP54', 'price': '1 460.27'}, {'title': 'Разъем 220 ИЭК (Розетка скрытая 413), 16А, IP44', 'price': '350.79'}, {'title': 'Разъем 220 ИЭК (Розетка 133), 63А, IP54', 'price': '2 069.92'}, {'title': 'Разъем 220 ИЭК (Розетка 423), 32А, IP44, MAGNUM', 'price': '567.41'}]
  • Related