Home > Net >  Pandas dataframe concat not adding data
Pandas dataframe concat not adding data

Time:09-20

I am scraping some data with Python and after getting data into Python I am unable to add it to dataframe. I am not getting any errors but my dataframe keeeps returning empty after execution. Here is my code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.airbnb.com/s/Honolulu--HI--United-States/homes?tab_id=home_tab&refinement_paths[]=/homes&flexible_trip_lengths[]=one_week&price_filter_input_type=0&query=Honolulu, HI&place_id=ChIJTUbDjDsYAHwRbJen81_1KEs&date_picker_type=calendar&checkin=2022-10-08&checkout=2022-10-09&source=structured_search_input_header&search_type=autocomplete_click'
page = requests.get (url, headers = {'User-agent': 'your bot 0.1'})
soup = BeautifulSoup(page.text, 'lxml')

df = pd.DataFrame({'Links': [''], 'Title': [''], 'Price' : [''], 'Rating': ['']})
postings = soup.findAll('div', class_= 'c4mnd7m dir dir-ltr')
for post in postings:
    try:
            title = post.find('div', class_ = 't1jojoys dir dir-ltr').text
            link = 'https://www.airbnb.com/'   post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href')
            price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text
            rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text
            df.concat({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating }, ignore_index = True)
    except:
         pass

print(df)

I can see values for title, link, price, rating variables so I believe the issues lies with this line of code:

df.concat({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating }, ignore_index = True)
    except:
        pass

Tried with df.append with no luck. Any help very much appreciated.

CodePudding user response:

Create list of dictionaries L, instead concat use L.append and last use DataFrame constructor:

L = []
postings = soup.findAll('div', class_= 'c4mnd7m dir dir-ltr')
for post in postings:
    try:
            title = post.find('div', class_ = 't1jojoys dir dir-ltr').text
            link = 'https://www.airbnb.com/'   post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href')
            price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text
            rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text
            L.append({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating })
    except:
         pass

df = pd.DataFrame(L)

print(df)
                                                Links                   Title  \
0   https://www.airbnb.com//rooms/38698456?check_i...  Shared room in Waikiki   
1   https://www.airbnb.com//rooms/34749923?check_i...   Hotel room in Waikiki   
2   https://www.airbnb.com//rooms/49130811?check_i...   Hotel room in Waikiki   
3   https://www.airbnb.com//rooms/5493877974945511...  Hotel room in Honolulu   
4   https://www.airbnb.com//rooms/6097834619275602...        Condo in Waikiki   
5   https://www.airbnb.com//rooms/6485693240232899...        Condo in Waikiki   
6   https://www.airbnb.com//rooms/38086946?check_i...   Hotel room in Waikiki   
7   https://www.airbnb.com//rooms/6155887524457386...        Condo in Waikiki   
8   https://www.airbnb.com//rooms/43237326?check_i...   Hotel room in Waikiki   
9   https://www.airbnb.com//rooms/18854943?check_i...        Condo in Waikiki   
10  https://www.airbnb.com//rooms/26063949?check_i...   Hotel room in Waikiki   
11  https://www.airbnb.com//rooms/20911438?check_i...        Condo in Waikiki   
12  https://www.airbnb.com//rooms/5849552455018825...    Apartment in Waikiki   
13  https://www.airbnb.com//rooms/52469808?check_i...    Apartment in Waikiki   
14  https://www.airbnb.com//rooms/53892193?check_i...        Condo in Waikiki   
15  https://www.airbnb.com//rooms/51690647?check_i...        Condo in Waikiki   
16  https://www.airbnb.com//rooms/5958726599105704...   Hotel room in Waikiki   
17  https://www.airbnb.com//rooms/49712614?check_i...       Resort in Waikiki   
18  https://www.airbnb.com//rooms/5415238?check_in...        Condo in Waikiki   
19  https://www.airbnb.com//rooms/32816789?check_i...        Condo in Waikiki   

                              Price      Rating  
0     $58 per night, originally $75  4.67 (248)  
1   $199 per night, originally $219  4.38 (189)  
2                    $193 per night  4.61 (121)  
3                    $199 per night  4.25 (317)  
4   $264 per night, originally $287   4.73 (11)  
5                    $389 per night    4.83 (6)  
6                    $270 per night  4.53 (107)  
7   $310 per night, originally $331   4.92 (13)  
8   $223 per night, originally $243   4.5 (146)  
9                    $369 per night  4.81 (188)  
10                   $193 per night  4.57 (712)  
11                   $227 per night  4.64 (177)  
12                   $302 per night   4.96 (23)  
13                   $189 per night     3.0 (4)  
14  $262 per night, originally $279   4.88 (32)  
15  $240 per night, originally $255   4.83 (42)  
16                   $221 per night   4.58 (26)  
17                   $255 per night   4.73 (37)  
18                   $551 per night   4.9 (155)  
19                   $335 per night  4.83 (109)  

CodePudding user response:

try this:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.airbnb.com/s/Honolulu--HI--United-States/homes?tab_id=home_tab&refinement_paths[]=/homes&flexible_trip_lengths[]=one_week&price_filter_input_type=0&query=Honolulu, HI&place_id=ChIJTUbDjDsYAHwRbJen81_1KEs&date_picker_type=calendar&checkin=2022-10-08&checkout=2022-10-09&source=structured_search_input_header&search_type=autocomplete_click'
page = requests.get (url, headers = {'User-agent': 'your bot 0.1'})
soup = BeautifulSoup(page.text, 'lxml')

ROWS = []
#no need to use try/except since you will test each title/link/price/rating (if ... else None)
#this way, you will not lose a row because no rating or no price...

for post in soup.findAll('div', class_= 'c4mnd7m dir dir-ltr'):
    title = post.find('div', class_ = 't1jojoys dir dir-ltr').text if post.find('div', class_ = 't1jojoys dir dir-ltr') else None
    link = 'https://www.airbnb.com/'   post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href') if post.find('a', class_ = 'ln2bl2p dir dir-ltr') else None 
    price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text if post.find('span', class_ = 'a8jt5op dir dir-ltr') else None
    rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text if post.find('span', class_ = 'ru0q88m dir dir-ltr') else None
    row = [link, title, price, rating]
    ROWS.append(row)

df = pd.DataFrame(ROWS, columns=['link', 'title', 'price', 'rating'])

df
    link                                                title                   price                           rating
0   https://www.airbnb.com//rooms/38698456?check_i...   Shared room in Waikiki  $57 per night, originally $74   4.67 (248)
1   https://www.airbnb.com//rooms/6097834619275602...   Condo in Waikiki        $124 per night, originally $147 4.73 (11)
2   https://www.airbnb.com//rooms/6155887524457386...   Condo in Waikiki        $157 per night, originally $177 4.92 (13)
3   https://www.airbnb.com//rooms/18854943?check_i...   Condo in Waikiki        $188 per night  4.81 (188)
4   https://www.airbnb.com//rooms/20911438?check_i...   Condo in Waikiki        $151 per night  4.64 (177)
5   https://www.airbnb.com//rooms/6485693240232899...   Condo in Waikiki        $229 per night  4.83 (6)
  • Related