Home > Net >  How to find specific HTML element after CSS selector with BeautifulSoup?
How to find specific HTML element after CSS selector with BeautifulSoup?

Time:01-05

I am trying to retrieve the last img scr of a web page doing webscraping with BeautifulSoup. So far I am trying to use a selector but it if impossible for me to find anything after the ::before selector.

The basic code is:

import requests
from bs4 import BeautifulSoup

s = requests.session()
r = s.get("https://www.immobiliare.it/vendita-case/milano/forlanini/?criterio=dataModifica&ordine=desc")

soup = BeautifulSoup(r.content, "lxml")

for property in soup.find_all("li", {"class": "nd-list__item in-realEstateResults__item"}):

The html code of the page has the following structure:

Each li is a property I want to extract the img src from.

Each li class="nd-list__item in-realEstateResults__item" is a property I want to extract the img src from

Bear in mind that the first image has an easier html code, I cannot get the src from the rest of them

CodePudding user response:

from bs4 import BeautifulSoup

html = """
<div >
   <div >
       <div >
           ::before
          <div >
              <div nd-slideshow__item
                  </div>
                  <div nd-slideshow__desired_item
                      <img src =”desired link”>
                 </div>
               </div>
            </div>
       </div>
   </div>"""

soup = BeautifulSoup(html, 'html.parser')

r = soup.select('div[class*="nd-slideshow"]')
print(r)

in result html after ::before

[<div >
<div <=""  div="" nd-slideshow__item="">
<div <img=""  link”="" nd-slideshow__desired_item="" src="”desired">
</div>
</div>
</div>, <div <=""  div="" nd-slideshow__item="">
<div <img=""  link”="" nd-slideshow__desired_item="" src="”desired">
</div>
</div>, <div <img=""  link”="" nd-slideshow__desired_item="" src="”desired">
</div>]

CodePudding user response:

As mentioned in the comments, content is rendered dynamically, so you will not get the expected result in used combination of requests, that will not render JS, like a browser will do, and BeautifulSoup that won`t find your expected elements, cause they are not there.

You could go with requests if you will use an api, some information comes from:

s.get('https://www.immobiliare.it/api-next/agencies/local-expert/?city-id=8042&province-id=MI&macrozone-id[0]=10294&limit=25&output=json').json()

->  {'agencies': [{'id': 9681, 'displayName': 'Fonte Immobiliare Città Studi 2', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/934856533.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/934856531.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/9681/fonte-citta-studi--milano/', 'address': 'Via Giovanni Briosi 10 20133 - Milano', 'bannerImage': 'https://pic.im-cdn.it/image/934857363/xs-c.jpg', 'externalId': None, 'timeContract': 11, 'paid': True}, {'id': 83565, 'displayName': 'Cfc Immobiliare ', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/244109821.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/244109817.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/83565/cfc-milano/', 'address': 'Via Carnia 7 20132 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.492900,9.236340&zoom=15&size=400x230&markers=45.492900,9.236340', 'externalId': None, 'timeContract': 10, 'paid': True}, {'id': 208668, 'displayName': 'YOUR HOME - Real Estate', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1127494478.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1127494476.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/208668/your-home-milano/', 'address': 'Bastioni Porta Nuova 21 20121 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.480100,9.188150&zoom=15&size=400x230&markers=45.480100,9.188150', 'externalId': None, 'timeContract': 7, 'paid': True}, {'id': 231505, 'displayName': 'Homepanda', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/693409659.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/693409657.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/231505/homepanda/', 'address': 'Via Gian Giacomo Mora 20 20123 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.458900,9.179330&zoom=15&size=400x230&markers=45.458900,9.179330', 'externalId': None, 'timeContract': 4, 'paid': True}, {'id': 118081, 'displayName': 'CONSULOVEST  CORBETTA Via Meroni 2 - MILANO V.le San Gimignano 8', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1162882814.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1162882812.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/118081/consulovest-corbetta/', 'address': 'Via Meroni 2 20011 - Corbetta', 'bannerImage': 'https://pic.im-cdn.it/image/1162882818/xs-c.jpg', 'externalId': None, 'timeContract': None, 'paid': False}, {'id': 5272, 'displayName': 'Arena Immobiliare S.R.L.', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/936162495.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/936162493.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/5272/arena-milano/', 'address': 'Via Marco Bruto 9 20138 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.459800,9.238870&zoom=15&size=400x230&markers=45.459800,9.238870', 'externalId': None, 'timeContract': 21, 'paid': False}, {'id': 32741, 'displayName': 'Studio emme3 ', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/196647202.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/196647201.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/32741/studio-emme-milano/', 'address': 'Via Pompeo Neri 2 20146 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.456800,9.143770&zoom=15&size=400x230&markers=45.456800,9.143770', 'externalId': None, 'timeContract': 4, 'paid': False}, {'id': 242120, 'displayName': 'Levia SRL', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/843934046.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/843934044.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/242120/levia-milano/', 'address': 'Viale Ungheria 20 20138 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.445700,9.246040&zoom=15&size=400x230&markers=45.445700,9.246040', 'externalId': None, 'timeContract': 3, 'paid': False}, {'id': 396994, 'displayName': 'Affiliato Tecnorete: STUDIO IMMOBILIARE CORSICA SRL', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1247888668.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1247888664.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/396994/tecnorete-milano-viale-ungheria/', 'address': 'Viale Ungheria 24 20135 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.445500,9.246760&zoom=15&size=400x230&markers=45.445500,9.246760', 'externalId': None, 'timeContract': 0, 'paid': False}, {'id': 140950, 'displayName': 'Abitare Agency Srl', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1135165888.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1135165886.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/140950/abitare-agency/', 'address': 'Via Voghera 7 20144 - Milano', 'bannerImage': 'https://pic.im-cdn.it/image/1135165932/xs-c.jpg', 'externalId': None, 'timeContract': 10, 'paid': False}, {'id': 94305, 'displayName': 'Affiliato Tecnocasa: IMMOBILIARE MARGOT SRLU', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1135591154.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1135591152.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/94305/tecnocasa-milano-via-mecenate/', 'address': 'Via Mecenate 4 20138 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.457400,9.242440&zoom=15&size=400x230&markers=45.457400,9.242440', 'externalId': None, 'timeContract': 8, 'paid': False}, {'id': 241224, 'displayName': 'INVIMIT SGR SpA', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/829818360.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/829818358.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/241224/invimit-roma/', 'address': 'Via di Santa Maria in Via 12 00187 - Roma', 'bannerImage': 'https://pic.im-cdn.it/image/825106468/xs-c.jpg', 'externalId': None, 'timeContract': 3, 'paid': False}, {'id': 209778, 'displayName': 'STUDIO6ERRE - Sede Milano', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/937464013.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/937464011.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/209778/studioerre-milano/', 'address': 'Viale Abruzzi 80 20131 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.483700,9.217150&zoom=15&size=400x230&markers=45.483700,9.217150', 'externalId': None, 'timeContract': 7, 'paid': False}, {'id': 166328, 'displayName': 'HB ADVISORY', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/311765482.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/311765478.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/166328/hb-advisory/', 'address': 'Corso Buenos Aires 60 20124 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.482200,9.212750&zoom=15&size=400x230&markers=45.482200,9.212750', 'externalId': None, 'timeContract': 9, 'paid': False}, {'id': 41477, 'displayName': 'StudioZimer', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1143272706.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1143272704.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/41477/studiozimer/', 'address': 'CORSO LODI 111 20135 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.441500,9.221210&zoom=15&size=400x230&markers=45.441500,9.221210', 'externalId': None, 'timeContract': 4, 'paid': False}, {'id': 386016, 'displayName': 'STUDIO ASTE MC', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1250133574.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1250133572.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/386016/studio-aste-mc-sesto-san-giovanni/', 'address': 'Via Carlo Cattaneo 49 20099 - Sesto San Giovanni', 'bannerImage': 'https://pic.im-cdn.it/image/1147591370/xs-c.jpg', 'externalId': None, 'timeContract': None, 'paid': False}, {'id': 42941, 'displayName': 'OBIETTIVOCASA', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/155958486.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/155958482.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/42941/obiettivocasa-milano-via-pordenone/', 'address': 'via pordenone 13 20132 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.490300,9.234840&zoom=15&size=400x230&markers=45.490300,9.234840', 'externalId': None, 'timeContract': 10, 'paid': False}, {'id': 392582, 'displayName': 'AsteGlobal', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1227120896.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1227120894.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/392582/asteblobal/', 'address': 'via Reali 13 20037 - Paderno Dugnano', 'bannerImage': 'https://pic.im-cdn.it/image/1227121058/xs-c.jpg', 'externalId': None, 'timeContract': None, 'paid': False}, {'id': 203747, 'displayName': 'Le case di Patty', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/811914140.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/811914138.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/203747/le-case-di-patty-milano/', 'address': 'Via Montebello 14 20121 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.475200,9.189070&zoom=15&size=400x230&markers=45.475200,9.189070', 'externalId': None, 'timeContract': 7, 'paid': False}, {'id': 35498, 'displayName': "Expo'  Servizi  Immobiliari", 'imageUrls': [], 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/35498/expo/', 'address': 'Viale Premuda 21 20129 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.466000,9.207020&zoom=15&size=400x230&markers=45.466000,9.207020', 'externalId': None, 'timeContract': 5, 'paid': False}, {'id': 228450, 'displayName': 'Aste Milano Immobiliare', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1061930169.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1061930167.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/228450/aste-rozzano/', 'address': 'Via Innocenzo Isimbardi 29 20141 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.435200,9.180890&zoom=15&size=400x230&markers=45.435200,9.180890', 'externalId': None, 'timeContract': 5, 'paid': False}, {'id': 5350, 'displayName': 'IMI immobiliare Milano - Partner Navigli', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/424092179.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/424092177.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/5350/imi-milano-navigli/', 'address': 'Via Conchetta 2 20136 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.446800,9.179270&zoom=15&size=400x230&markers=45.446800,9.179270', 'externalId': None, 'timeContract': 12, 'paid': False}, {'id': 237934, 'displayName': 'ASTA4YOU', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1112140922.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1112140920.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/237934/astayou/', 'address': 'Via Domenico Cimarosa 26 20144 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.464200,9.157880&zoom=15&size=400x230&markers=45.464200,9.157880', 'externalId': None, 'timeContract': 3, 'paid': False}, {'id': 28201, 'displayName': 'Meta Immobiliare - Massimo Valore Certificato', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/962490064.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/962490062.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/28201/meta-san-donato/', 'address': 'Via Alfonsine 34 20097 - San Donato Milanese', 'bannerImage': 'https://pic.im-cdn.it/image/854995656/xs-c.jpg', 'externalId': None, 'timeContract': 9, 'paid': False}, {'id': 3986, 'displayName': 'TREC s.a.s', 'imageUrls': {'large': 'https://pic.im-cdn.it/imagenoresize/1040856624.jpg', 'small': 'https://pic.im-cdn.it/imagenoresize/1040856622.jpg'}, 'agencyUrl': 'https://www.immobiliare.it/agenzie-immobiliari/3986/tre-c/', 'address': 'Via Negroli 49 20133 - Milano', 'bannerImage': 'https://maps.im-cdn.it/static?center=45.467200,9.232320&zoom=15&size=400x230&markers=45.467200,9.232320', 'externalId': None, 'timeContract': 1, 'paid': False}], 'searchAgencyUrl': 'http://www.immobiliare.it/agenzie-immobiliari/milano/?idMZona[]=10294'}

or with selenium to mimic a browser and work on the rendered driver.page_source.

Example

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

url = 'https://www.immobiliare.it/vendita-case/milano/forlanini/?criterio=dataModifica&ordine=desc'
driver.get(url)

soup = BeautifulSoup(driver.page_source)

data = []
for e in soup.select('li.in-realEstateResults__item'):
    data.append({
        'title':e.a.get('title'),
        'imgUrls':[i.get('src') for i in e.select('.nd-list__item img')]
    })
data

Output

[{'title': 'Bilocale buono stato, primo piano, Viale Ungheria - Mecenate, Milano', 'imgUrls': ['https://pwm.im-cdn.it/image/1261450576/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261450580/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261702222/xxs-c.jpg', 'https://maps.im-cdn.it/static?zoom=15&size=360x270&language=it&style=feature:road|element:labels|visibility:off&sensor=false&markers=icon:https://s1.immobiliare.it/_next/static/media/map-marker.27fc2b6f.png|45.4565,9.2427&center=45.4565,9.2427', 'https://pic.im-cdn.it/imagenoresize/875151762.jpg']}, {'title': 'Bilocale via Romualdo Bonfadini 82, Viale Ungheria - Mecenate, Milano', 'imgUrls': ['https://pwm.im-cdn.it/image/1261689706/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689762/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689770/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689736/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689806/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689780/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689794/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689744/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689718/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689728/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689628/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689636/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689752/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689674/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689694/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689680/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689690/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689670/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689652/xxs-c.jpg', 'https://pwm.im-cdn.it/image/1261689816/xxs-c.jpg', 'https://maps.im-cdn.it/static?zoom=15&size=360x270&language=it&style=feature:road|element:labels|visibility:off&sensor=false&markers=icon:https://s1.immobiliare.it/_next/static/media/map-marker.27fc2b6f.png|45.4442,9.2417&center=45.4442,9.2417', 'https://pic.im-cdn.it/imagenoresize/949757836.jpg']}, {'title': 'Appartamento via Oreste Salomone, Viale Ungheria - Mecenate, Milano', 'imgUrls': ['https://pwm.im-cdn.it/image/1256189648/xxs-c.jpg', 'https://pic.im-cdn.it/imagenoresize/994952108.jpg']},...]

CodePudding user response:

Based on your screenshot, I searched the div element that has the "nd-slideshow__item in-realEstateListCard__mapInfo" class and then I could get the image inside the "div" element.

With this idea, I've modified your code as follows:

import requests
from bs4 import BeautifulSoup
  
url = "https://www.immobiliare.it/vendita-case/milano/forlanini/?criterio=dataModifica&ordine=desc"
page = requests.get(url)
soup = BeautifulSoup(page.content, "lxml")

# The image you want is inside a img HTML element which is contained inside a "div" element: 
div_element = soup.find_all("div", class_="nd-slideshow__item in-realEstateListCard__mapInfo")

# Print the "src" value of the img HTML element found on the div
print(div_element[0].find("img")["src"])

And this is the result I got:

https://maps.im-cdn.it/static?zoom=15&size=360x270&language=it&style=feature:road|element:labels|visibility:off&sensor=false&markers=icon:https://s1.immobiliare.it/_next/static/media/map-marker.27fc2b6f.png|45.4565,9.2427&center=45.4565,9.2427
  • Related