I am scraping some data with Python and after getting data into Python I am unable to add it to dataframe. I am not getting any errors but my dataframe keeeps returning empty after execution. Here is my code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.airbnb.com/s/Honolulu--HI--United-States/homes?tab_id=home_tab&refinement_paths[]=/homes&flexible_trip_lengths[]=one_week&price_filter_input_type=0&query=Honolulu, HI&place_id=ChIJTUbDjDsYAHwRbJen81_1KEs&date_picker_type=calendar&checkin=2022-10-08&checkout=2022-10-09&source=structured_search_input_header&search_type=autocomplete_click'
page = requests.get (url, headers = {'User-agent': 'your bot 0.1'})
soup = BeautifulSoup(page.text, 'lxml')
df = pd.DataFrame({'Links': [''], 'Title': [''], 'Price' : [''], 'Rating': ['']})
postings = soup.findAll('div', class_= 'c4mnd7m dir dir-ltr')
for post in postings:
try:
title = post.find('div', class_ = 't1jojoys dir dir-ltr').text
link = 'https://www.airbnb.com/' post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href')
price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text
rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text
df.concat({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating }, ignore_index = True)
except:
pass
print(df)
I can see values for title, link, price, rating variables so I believe the issues lies with this line of code:
df.concat({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating }, ignore_index = True)
except:
pass
Tried with df.append with no luck. Any help very much appreciated.
CodePudding user response:
Create list of dictionaries L
, instead concat
use L.append
and last use DataFrame
constructor:
L = []
postings = soup.findAll('div', class_= 'c4mnd7m dir dir-ltr')
for post in postings:
try:
title = post.find('div', class_ = 't1jojoys dir dir-ltr').text
link = 'https://www.airbnb.com/' post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href')
price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text
rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text
L.append({'Links': link, 'Title': title, 'Price' : price, 'Rating': rating })
except:
pass
df = pd.DataFrame(L)
print(df)
Links Title \
0 https://www.airbnb.com//rooms/38698456?check_i... Shared room in Waikiki
1 https://www.airbnb.com//rooms/34749923?check_i... Hotel room in Waikiki
2 https://www.airbnb.com//rooms/49130811?check_i... Hotel room in Waikiki
3 https://www.airbnb.com//rooms/5493877974945511... Hotel room in Honolulu
4 https://www.airbnb.com//rooms/6097834619275602... Condo in Waikiki
5 https://www.airbnb.com//rooms/6485693240232899... Condo in Waikiki
6 https://www.airbnb.com//rooms/38086946?check_i... Hotel room in Waikiki
7 https://www.airbnb.com//rooms/6155887524457386... Condo in Waikiki
8 https://www.airbnb.com//rooms/43237326?check_i... Hotel room in Waikiki
9 https://www.airbnb.com//rooms/18854943?check_i... Condo in Waikiki
10 https://www.airbnb.com//rooms/26063949?check_i... Hotel room in Waikiki
11 https://www.airbnb.com//rooms/20911438?check_i... Condo in Waikiki
12 https://www.airbnb.com//rooms/5849552455018825... Apartment in Waikiki
13 https://www.airbnb.com//rooms/52469808?check_i... Apartment in Waikiki
14 https://www.airbnb.com//rooms/53892193?check_i... Condo in Waikiki
15 https://www.airbnb.com//rooms/51690647?check_i... Condo in Waikiki
16 https://www.airbnb.com//rooms/5958726599105704... Hotel room in Waikiki
17 https://www.airbnb.com//rooms/49712614?check_i... Resort in Waikiki
18 https://www.airbnb.com//rooms/5415238?check_in... Condo in Waikiki
19 https://www.airbnb.com//rooms/32816789?check_i... Condo in Waikiki
Price Rating
0 $58 per night, originally $75 4.67 (248)
1 $199 per night, originally $219 4.38 (189)
2 $193 per night 4.61 (121)
3 $199 per night 4.25 (317)
4 $264 per night, originally $287 4.73 (11)
5 $389 per night 4.83 (6)
6 $270 per night 4.53 (107)
7 $310 per night, originally $331 4.92 (13)
8 $223 per night, originally $243 4.5 (146)
9 $369 per night 4.81 (188)
10 $193 per night 4.57 (712)
11 $227 per night 4.64 (177)
12 $302 per night 4.96 (23)
13 $189 per night 3.0 (4)
14 $262 per night, originally $279 4.88 (32)
15 $240 per night, originally $255 4.83 (42)
16 $221 per night 4.58 (26)
17 $255 per night 4.73 (37)
18 $551 per night 4.9 (155)
19 $335 per night 4.83 (109)
CodePudding user response:
try this:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.airbnb.com/s/Honolulu--HI--United-States/homes?tab_id=home_tab&refinement_paths[]=/homes&flexible_trip_lengths[]=one_week&price_filter_input_type=0&query=Honolulu, HI&place_id=ChIJTUbDjDsYAHwRbJen81_1KEs&date_picker_type=calendar&checkin=2022-10-08&checkout=2022-10-09&source=structured_search_input_header&search_type=autocomplete_click'
page = requests.get (url, headers = {'User-agent': 'your bot 0.1'})
soup = BeautifulSoup(page.text, 'lxml')
ROWS = []
#no need to use try/except since you will test each title/link/price/rating (if ... else None)
#this way, you will not lose a row because no rating or no price...
for post in soup.findAll('div', class_= 'c4mnd7m dir dir-ltr'):
title = post.find('div', class_ = 't1jojoys dir dir-ltr').text if post.find('div', class_ = 't1jojoys dir dir-ltr') else None
link = 'https://www.airbnb.com/' post.find('a', class_ = 'ln2bl2p dir dir-ltr').get('href') if post.find('a', class_ = 'ln2bl2p dir dir-ltr') else None
price = post.find('span', class_ = 'a8jt5op dir dir-ltr').text if post.find('span', class_ = 'a8jt5op dir dir-ltr') else None
rating = post.find('span', class_ = 'ru0q88m dir dir-ltr').text if post.find('span', class_ = 'ru0q88m dir dir-ltr') else None
row = [link, title, price, rating]
ROWS.append(row)
df = pd.DataFrame(ROWS, columns=['link', 'title', 'price', 'rating'])
df
link title price rating
0 https://www.airbnb.com//rooms/38698456?check_i... Shared room in Waikiki $57 per night, originally $74 4.67 (248)
1 https://www.airbnb.com//rooms/6097834619275602... Condo in Waikiki $124 per night, originally $147 4.73 (11)
2 https://www.airbnb.com//rooms/6155887524457386... Condo in Waikiki $157 per night, originally $177 4.92 (13)
3 https://www.airbnb.com//rooms/18854943?check_i... Condo in Waikiki $188 per night 4.81 (188)
4 https://www.airbnb.com//rooms/20911438?check_i... Condo in Waikiki $151 per night 4.64 (177)
5 https://www.airbnb.com//rooms/6485693240232899... Condo in Waikiki $229 per night 4.83 (6)