Home > Enterprise >  Can't scrape all elements with beautifulsoup
Can't scrape all elements with beautifulsoup

Time:04-11

I'm trying to scrape all the articles in this web page but i only managed to scrape the first article, anyone can tell me how to solve this? my code as below:

from bs4 import BeautifulSoup
import requests


sauce = requests.get('https://www.automobile.tn/fr/neuf/alfa-romeo').text
soup = BeautifulSoup(sauce, 'lxml')

def find_prices(item):
    price = item.find('div', class_='price').span.text
    return price
    
def find_names(item):
      name = item.find('div', class_='versions-item').h2.text
      return name

articles = soup.findAll('div', class_='articles')
Articlelist= list()
for article in articles:
  
    Articledict= dict()
    Articledict['name'] = find_names(article)
    Articledict['price'] = find_prices(article)
    
    Articlelist.append(Articledict)

  
print(Articlelist)

this is the output of my code:

[{'name': 'Alfa Romeo Giulia', 'price': '198 000 DT'}]

CodePudding user response:

Main issue here is that you select with soup.findAll('div', class_='articles') only one element, so your loop iterates only once.

Note In newer code avoid old syntax findAll() instead use find_all() - For more take a minute to check docs

To fix that behavior select more specific and use e.g. the container with class "version-item":

soup.find_all('div', class_='versions-item')    
Example
from bs4 import BeautifulSoup
import requests

res = requests.get('https://www.automobile.tn/fr/neuf/alfa-romeo').text
soup = BeautifulSoup(res)

data = []

for item in soup.find_all('div', class_='versions-item'):
    
    data.append({
        'name':item.h2.text,
        'price':item.find('div', class_='price').span.text
    })


data
Output
[{'name': 'Alfa Romeo Giulia', 'price': '198 000 DT'},
 {'name': 'Alfa Romeo Stelvio', 'price': '265 000 DT'}]
  • Related