Extract a span value stored deep inside a class using (BeautifulSoup)-CodePudding

Hello everybody,

I am trying to extract a value stored in a "span" which does not have a direct class. In the html below, there are two classes of my interest: "bill_of_sale" and "mortgage". They have two span values: "Kupça var" and "İpoteka var" respectively. I need to exract these values for each item. I have done item part already. I just need to extract these values stored deeply in the classes.


         <div >
                <div  data-swiper-wrap="" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
                   
                <div ><a  target="_blank" href="/items/2810476"></a><span>Kupça var</span></div>
                <div ><a  target="_blank" href="/items/2810476"></a><span>İpoteka var</span></div>
                <div ><span ></span><span ></span></div>
                <div >Agentlik</div>

The code below allows me to extract values if it comes from a class with the following structure:

<div >Həzi Aslanov m.</div>


page = 1

locations=[] #List to store price of the product

while page != 1200:
    
    url = f"https://bina.az/baki/alqi-satqi/menziller?page={page}"
    page_main = requests.get(url)
    soup = BeautifulSoup(page_main.content, "html.parser")

    results = soup.find(id="js-items-search")
    job_elements = results.find_all("div", class_="card_params")

    for job_element in job_elements:
        
        location = job_element.find(class_="location")
        locations.append(location.text)
    
    page = page   1

However, the code above does not work if I want to extract a span value which is deep inside a class (the problem I described in the beginning).

Thank you in advance

CodePudding user response：

You can access the deeper results of a class like this:

bill = item.find('div',class_='bill_of_sale').find('span').text.strip()

Here is a working example that will get the details of all listings and output the results to csv:

import requests
from bs4 import BeautifulSoup
import pandas as pd

page = 1

locations=[] #List to store price of the product

while page != 20:
    print(f'Scraping page {page}')
    
    url = f"https://bina.az/baki/alqi-satqi/menziller?page={page}"
    page_main = requests.get(url)
    soup = BeautifulSoup(page_main.content, "html.parser")

    for results in soup.find_all('div',class_='items_list'):        #there are multiple listing containers
        for item in results.find_all('div',class_='vipped'):

            location = item.find(class_="location").text.strip()
            
            try:
                bill = item.find('div',class_='bill_of_sale').find('span').text.strip()
            except AttributeError:
                bill = ''

            try:
                mort = item.find('div',class_='mortgage').find('span').text.strip()
            except AttributeError:
                mort = ''

            price = item.find('div',class_='price').text.strip()

            rooms,size,floor = ('','','')
            for detail in item.find('ul',class_='name').find_all('li'):
                if 'otaqlı' in detail.text:
                    rooms = detail.text.strip()
                elif 'm²' in detail.text:
                    size = detail.text.strip()
                elif 'mərtəbə' in detail.text:
                    floor = detail.text.strip()
                
    
            item = {
                'location':location,
                'bill':bill,
                'mortage':mort,
                'price': price,
                'rooms':rooms,
                'size':size,
                'floor':floor
                }

            locations.append(item)

    page  = 1
    
df = pd.DataFrame(locations)
df.to_csv('locations.csv',index=False)

CodePudding user response：

Once you get that node by the specified <div> and class, you can use .find_next() to get that <span>:

from bs4 import BeautifulSoup, Comment

html = '''<div >
                <div  data-swiper-wrap="" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
                   
                <div ><a  target="_blank" href="/items/2810476"></a><span>Kupça var</span></div>
                <div ><a  target="_blank" href="/items/2810476"></a><span>İpoteka var</span></div>
                <div ><span ></span><span ></span></div>
                <div >Agentlik</div>'''

soup = BeautifulSoup(html, 'html.parser')

div_bos = soup.find('div', {'class':'bill_of_sale'}).find_next('span').text
div_mortgage = soup.find('div', {'class':'mortgage'}).find_next('span').text



print(div_bos)
print(div_mortgage)

Output:

print(div_bos)
print(div_mortgage)
Kupça var
İpoteka var