Home > Back-end >  python beautifulsoup duplicating results
python beautifulsoup duplicating results

Time:08-15

I'm trying to learn beatifulsoup (and python as a whole, pretty much still a beginner) and playing around with how to use it properly. I notice that when I scrape the website I'm testing for data from the search results, it lists it 3 times.

Specifically, I'm trying to output the title, link, and price of the real estate property from the website. The price doesn't seem to duplicate while the title and link does. Can't really figure out if it's because of my code or something with the website itself.

import requests
from bs4 import BeautifulSoup

userSearch = input('Input search: ')
link = 'https://www.lamudi.com.ph/buy/?q={}'.format(userSearch)
page = requests.get(link)
soup = BeautifulSoup(page.content, 'html.parser')

titleList = soup.find_all("a", title=True)
priceList = soup.find_all("span", class_="PriceSection-FirstPrice", text=True)

for (i,j) in zip(titleList, priceList):
    print(i['title'])
    print(i['href'])
    print(j.get_text())
    print("===============")

Output would be something like this where the price doesn't match the listing because of the duplicated info:

Input Property search: manila
Suntrust Solana in  Ermita Manila  3 Bedroom Unit for Sale
https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html

                                        ₱ 6,672,197 
                                    
===============
Suntrust Solana in  Ermita Manila  3 Bedroom Unit for Sale
https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html

                                        ₱ 6,888,800 
                                    
===============
Suntrust Solana in  Ermita Manila  3 Bedroom Unit for Sale
https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html

                                        ₱ 168,000,000 
                                    
===============
3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H
https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html

                                        ₱ 53,000,000 
                                    
===============
3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H
https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html

                                        ₱ 53,000,000 
                                    
===============
3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H
https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html

                                        ₱ 46,500,000 

CodePudding user response:

You should iterate row-by-row. It's safer than using zip().

To get all titles, links prices you can use next example:

import requests
from bs4 import BeautifulSoup

userSearch = "manila"
link = "https://www.lamudi.com.ph/buy/?q={}".format(userSearch)

soup = BeautifulSoup(requests.get(link).content, "html.parser")


for row in soup.select(".ListingCell-row"):
    title = row.h2.get_text(strip=True)
    link = row.a["href"]
    price = row.select_one(
        ".PriceSection-FirstPrice, .PriceSection-NoPrice"
    ).get_text(strip=True)
    print(title)
    print(link)
    print(price)
    print("=" * 80)

Prints:

Suntrust Solana in  Ermita Manila  3 Bedroom Unit for Sale
https://www.lamudi.com.ph/suntrust-solana-in-ermita-manila-3-bedroom-unit-for-sale.html
₱ 6,672,197
================================================================================
3 Bedroom Unit For Sale in Suntrust Solana Manila- SOLA-01-21-H
https://www.lamudi.com.ph/3-bedroom-unit-for-sale-in-suntrust-solana-manila-sola-01-21-h.html
₱ 6,888,800
================================================================================
Rare Sale!!Prime Commercial Property strategically in Paco, Manila
https://www.lamudi.com.ph/rare-sale-prime-commercial-property-strategically-in-paco-manila.html
₱ 168,000,000
================================================================================
A Luxurious Unit E Townhouse For Sale in A Peaceful Neighborhood in Paco Manila
https://www.lamudi.com.ph/a-luxurious-unit-e-townhouse-for-sale-in-a-peaceful-neighborhood-in-paco-manila.html
₱ 53,000,000
================================================================================
For Sale Luxurious 4-Bedroom Townhouse in a Peaceful Neighborhood in Paco Manila
https://www.lamudi.com.ph/for-sale-luxurious-4-bedroom-townhouse-in-a-peaceful-neighborhood-in-paco-manila.html
₱ 53,000,000
================================================================================

...
  • Related