Home > Software engineering >  How can I get the data from a span in BeautifulSoup?
How can I get the data from a span in BeautifulSoup?

Time:10-16

This is my code, I want to take the location's name and link, the variable "lugares" finds multiple item-containers, but I only want the first one [0]; then goes the for loop, but I can't find the span classes.

from bs4 import BeautifulSoup
import requests

b=[]
i="https://www.vivanuncios.com.mx"
url = "https://www.vivanuncios.com.mx/s-renta-inmuebles/estado-de-mexico/v1c1098l1014p1"

encabezado = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",'Accept-Language': 'en-US, en;q=0.5'}

page =requests.get(url,headers=encabezado)

soup = BeautifulSoup(page.content,"html.parser")

lugares = soup.find_all("div",{"class":"items-container"})

lugares=lugares[0]
print(len(lugares))

for lugar in lugares:
    
    locationlink = i   str(lugar.find("span",{"class":"item"}).find("a")["href"])

    location= lugar.find("span",{"class":"item"}).text
    a=[location,locationlink]
    
    b.append(a)

CodePudding user response:

First, you need to get all spans in the first Lugares lugares[0].

Then you need to iterate for each span to get the link and text for each location.

Workin code:

from bs4 import BeautifulSoup
import requests

b=[]
i="https://www.vivanuncios.com.mx"
url = "https://www.vivanuncios.com.mx/s-renta-inmuebles/estado-de-mexico/v1c1098l1014p1"

encabezado = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",'Accept-Language': 'en-US, en;q=0.5'}

page =requests.get(url,headers=encabezado)

soup = BeautifulSoup(page.content,"html.parser")

lugares = soup.find_all("div",{"class":"items-container"})

#lugares=lugares[0]
print(len(lugares))

# get all spans
spans = lugares[0].find_all("span",{"class":"item"})

# itreate throw each span
for span in spans: 
    # get location text
    location = span.find("a").text

    # locationlink builder
    site = "www.vivanuncios.com.mx"
    link = span.find("a")["href"]
    locationlink = f"{site}{link}"   

    a = [location,locationlink]
    b.append(a)

print (b[0])

Result sample:

['Huixquilucan', 'www.vivanuncios.com.mx/s-renta-inmuebles/huixquilucan/v1c1098l10689p1']

CodePudding user response:

There are multiple options to get the goal, best one depence on what you expect and wanna do with this information in follow up process.

First Option

If you are just looking for the infos of first location you can do the following:

lugar = soup.select_one('div.items-container a')   
b = [lugar.text, f'{i}{lugar["href"]}']

or

lugar = soup.select('div.items-container a')[0]
b = [lugar.text, f'{i}{lugar["href"]}']

Both select the first <a> in the <div> with class items-container.

Output

['Huixquilucan','https://www.vivanuncios.com.mx/s-renta-inmuebles/huixquilucan/v1c1098l10689p1']

Alternativ

If you are interested to get all at once, you should use a list of dicts, so later on you just have to iterate it and get all information in place:

[{'name':x.text, 'link':f'{i}{x["href"]}'} for x in soup.select('div.items-container a')]

Output

[{'name': 'Huixquilucan',
  'link': 'https://www.vivanuncios.com.mx/s-renta-inmuebles/huixquilucan/v1c1098l10689p1'},
 {'name': 'Naucalpan',
  'link': 'https://www.vivanuncios.com.mx/s-renta-inmuebles/naucalpan/v1c1098l10710p1'},
 {'name': 'Atizapán',
  'link': 'https://www.vivanuncios.com.mx/s-renta-inmuebles/atizapan/v1c1098l10662p1'},
 {'name': 'Metepec',
  'link': 'https://www.vivanuncios.com.mx/s-renta-inmuebles/metepec-edomex/v1c1098l10707p1'},...]

Example (showing results of both)

from bs4 import BeautifulSoup
import requests

i="https://www.vivanuncios.com.mx"
url = "https://www.vivanuncios.com.mx/s-renta-inmuebles/estado-de-mexico/v1c1098l1014p1"

encabezado = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",'Accept-Language': 'en-US, en;q=0.5'}

page =requests.get(url,headers=encabezado)
soup = BeautifulSoup(page.content,"html.parser")

lugar = soup.select_one('div.items-container a')
b = [lugar.text, f'{i}{lugar["href"]}']
print(f'First lugar:\n {b} \n')

## or alternative option

allLugaros = [{'name':x.text, 'link':f'{i}{x["href"]}'} for x in soup.select('div.items-container a')]

print(f'First lugar from lugaros (list of dict):\n {allLugaros[0]} \n')
print(f'All lugaros as list of dict:\n {allLugaros} \n')
  • Related