I'm trying to retrieve information from a site by web scraping. The information I need is found in sub-tabs, but I'm not able to get it
<div >
<div ><span>
House
3
pièces,
74 m²
</span>
<cite>
New York (11111)
</cite>
</div>
</div>,
<div >
<div ><span>
Appartement
3
pièces,
64 m²
</span>
<cite>
Los Angeles (22222)
</cite>
</div>
<div >
<div ><span>
House
4
pièces,
81 m²
</span>
<cite>
Chicago (33333)
</cite>
</div>
I'm trying to get the ad and the city. I tried:
#BeautifulSoup
from bs4 import BeautifulSoup
import requests
#to get: House 3 pièces, 74 m²
ad = [ad.get_text() for ad in soup.find_all("span", class_='ergov3-txtannonce')]
#to get cities
cities = [city.get_text() for city in soup.find_all("cite", class_='ergov3-txtannonce')]
My output:
[]
[]
Good output:
["House 3 pièces, 74 m²", "Appartement 3 pièces, 64 m²", "House 4 pièces, 81 m²"]
["New York (11111)", "Los Angeles (22222)", "Chicago (33333)"]
CodePudding user response:
Assuming you soup
contains the provided HTML
select the elements that holds your information and iterate over the ResultSet
to scrape the information. avoid multiple lists, try to scrape all information in one go and save it in a more structured way:
...
data = []
for e in soup.select('.ergov3-txtannonce'):
data.append({
'title':e.span.get_text(strip=True),
'city':e.cite.get_text(strip=True)
})
...
Note: If the elements are not present in your soup, content of website may provided dynamically by JavaScript
- This would be predestined for asking a new question with exact this focus
Example
from bs4 import BeautifulSoup
html='''
<div >
<div ><span>
House 3 pièces, 74 m²
</span>
<cite>
New York (11111)
</cite>
</div>
</div>,
<div >
<div ><span>
Appartement 3 pièces, 64 m²
</span>
<cite>
Los Angeles (22222)
</cite>
</div>
<div >
<div ><span>
House 4 pièces, 81 m²
</span>
<cite>
Chicago (33333)
</cite>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('.ergov3-txtannonce'):
data.append({
'title':e.span.get_text(strip=True),
'city':e.cite.get_text(strip=True)
})
data
Output
[{'title': 'House 3 pièces, 74 m²', 'city': 'New York (11111)'},
{'title': 'Appartement 3 pièces, 64 m²', 'city': 'Los Angeles (22222)'},
{'title': 'House 4 pièces, 81 m²', 'city': 'Chicago (33333)'}]