Home > OS >  Getting text from HTML using BeautifulSoup and Requests
Getting text from HTML using BeautifulSoup and Requests

Time:10-14

I'm trying to do a real estate scraper to search for apartments. The desired output is:

Condomínio R$ 980

and I'm getting this: [<div class="info-right text-xs-right"><p><span class="h-money">Condomínio R$ 980</span></p></div>]

How can I extract this text from the span tag?

The webscraper code is this one:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import mysql.connector
from mysql.connector import MySQLConnection, Error
import itertools
import time

def main():
    
    list_price = []
    list_info_extra = []
    list_descrip = []
    list_url = []

    #Connection and cursor creation
    mydb = mysql.connector.connect(host="localhost", user="guilherme", passwd="fadel_gui", database="dawn18")
    cursor = mydb.cursor()
    if mydb.cursor:
        print("Connected to database")

    headers = ({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228 Safari/537.36'})

    URL = ["https://www.imobiliariapadreanchieta.com.br/imoveis/a-venda/apartamento/curitiba/bigorrilho", 
    "https://www.imobiliariapadreanchieta.com.br/imoveis/a-venda/apartamento/curitiba/bigorrilho?pagina=2", 
    "https://www.imobiliariapadreanchieta.com.br/imoveis/a-venda/apartamento/curitiba/bigorrilho?pagina=3"]

    for url in range (0, 3):
        
        response = requests.get(URL[url], headers=headers)

        soup = BeautifulSoup(response.text, 'html.parser')

        text = soup.find_all(text = True)

        house_containers = soup.find_all('div', class_= "col-sm-12 col-lg-6 box-align")


        if house_containers != []:

            for container in house_containers:

                #Price information
                price = container.find_all('div', class_="info-left")[0].text


                IPTU = soup.select_one('div.info-right.text-xs-right p span.h-money').text

                info_containers = soup.find_all('div', class_="values")

                for info in info_containers:
                    get_info = info.select_one('span', class_="h-money")

                    if get_info:
                        info_apart = get_info
                    else:
                        info_apart = 'No info'

                if IPTU:
                    info_right = IPTU
                else:
                    info_right = 'No info'
            
                url_imovel = 'www.imobiliariapadreanchieta.com.br'   container.find_all('a')[0].get('href')
                
                print(price)
                print(info_apart)
                print(info_right)
                print(url_imovel)
                print('\n')

if __name__ == "__main__":
    main()

I'm trying to get prices, urls and information like: rooms, bathrooms, master bedrooms, etc. The URLs i'm scraping can be seen in the "URL" list.

CodePudding user response:

Here is the NoneType error free working example

from bs4 import BeautifulSoup
import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228 Safari/537.36'}
URL = ["https://www.imobiliariapadreanchieta.com.br/imoveis/a-venda/apartamento/curitiba/bigorrilho?pagina=" str(x) "" for x in range(1,4)]

for url in URL:

    response = requests.get(url, headers=headers)

    soup = BeautifulSoup(response.text, 'html.parser')

    for container in soup.find_all('div', class_="card card-listing"):
        price = container.find('span', class_="h-money location").text
        
        r= container.select_one('div.info-right.text-xs-right p:nth-child(1) span')
        p = r.text if r else None
        q= container.select_one('div.info-right.text-xs-right p:nth-child(2) span')
        t= q.text if q else None
        
            
        print('price:'   str(price)   ','   'Condomínio:'    str(p)  ','   'IPTU:'   str(t))  
    print("-" * 85)  

Output:

price:R$ 135.000,Condomínio:Condomínio R$ 350,IPTU:IPTU R$ 40/mês
price:R$ 160.000,Condomínio:Condomínio R$ 300,IPTU:IPTU R$ 36/mês
price:R$ 165.000,Condomínio:Condomínio R$ 350,IPTU:IPTU R$ 40/mês
price:R$ 180.000,Condomínio:Condomínio R$ 350,IPTU:IPTU R$ 40/mês
price:R$ 180.000,Condomínio:Condomínio R$ 400,IPTU:IPTU R$ 39/mês
price:R$ 265.000,Condomínio:Condomínio R$ 625,IPTU:IPTU R$ 40/mês
price:R$ 290.000,Condomínio:Condomínio R$ 350,IPTU:None
price:R$ 295.000,Condomínio:Condomínio R$ 566,IPTU:IPTU R$ 536/ano
price:R$ 299.000,Condomínio:Condomínio R$ 600,IPTU:IPTU R$ 32/mês
price:R$ 329.000,Condomínio:Condomínio R$ 443,IPTU:None
price:R$ 390.000,Condomínio:Condomínio R$ 565,IPTU:IPTU R$ 101/mês
price:R$ 400.000,Condomínio:Condomínio R$ 450,IPTU:None
-------------------------------------------------------------------------------------
price:R$ 450.000,Condomínio:Condomínio R$ 700,IPTU:IPTU R$ 90/mês
price:R$ 450.000,Condomínio:Condomínio R$ 800,IPTU:IPTU R$ 100/mês  
price:R$ 465.000,Condomínio:Condomínio R$ 480,IPTU:None
price:R$ 480.000,Condomínio:Condomínio R$ 745,IPTU:IPTU R$ 98/mês   
price:R$ 510.000,Condomínio:Condomínio R$ 600,IPTU:None
price:R$ 515.000,Condomínio:Condomínio R$ 700,IPTU:IPTU R$ 100/mês  
price:R$ 575.000,Condomínio:Condomínio R$ 850,IPTU:IPTU R$ 1.000/mês
price:R$ 590.000,Condomínio:Condomínio R$ 600,IPTU:IPTU R$ 140/mês  
price:R$ 590.000,Condomínio:Condomínio R$ 550,IPTU:IPTU R$ 78/mês   
price:R$ 690.000,Condomínio:Condomínio R$ 1.080,IPTU:IPTU R$ 210/mês
price:R$ 720.000,Condomínio:Condomínio R$ 1.000,IPTU:IPTU R$ 152/mês
price:R$ 795.000,Condomínio:Condomínio R$ 950,IPTU:IPTU R$ 143/mês
-------------------------------------------------------------------------------------        
price:R$ 830.000,Condomínio:Condomínio R$ 1.020,IPTU:IPTU R$ 156/mês
price:R$ 850.000,Condomínio:Condomínio R$ 780,IPTU:None
price:R$ 889.000,Condomínio:None,IPTU:None
price:R$ 890.000,Condomínio:Condomínio R$ 905,IPTU:None
price:R$ 923.000,Condomínio:Condomínio R$ 600,IPTU:None
price:R$ 990.000,Condomínio:Condomínio R$ 980,IPTU:None
price:R$ 1.000.000,Condomínio:None,IPTU:None
price:R$ 1.171.460,Condomínio:None,IPTU:None
price:R$ 1.221.000,Condomínio:None,IPTU:None
price:R$ 1.418.000,Condomínio:None,IPTU:None
-------------

CodePudding user response:

What happens?

You try to select the first value from soup in each iteration, thats why the result is always the same:

IPTU = soup.select_one('div.info-right.text-xs-right p span.h-money').text

How to fix?

Change soup into container:

IPTU = container.select_one('div.info-right.text-xs-right p span.h-money').text

EDIT

as Guilherme has mentioned, a check for the presence of the element can also be established to exclude error messages:

IPTU = container.select_one('div.info-right.text-xs-right p span.h-money').text if container.select_one('div.info-right.text-xs-right p span.h-money') else 'No info here'
  • Related