I want to merge 2 pandas data frames. I am working on the Airbnb dataset with listings and reviews data frame for a city.
listings_df = {'id':[1,2,3,4],
'host_name':['adasd','dasda', 'asdasd']}
reviews_df = {'listing_id':[1,1,2,2,3],
'reviews':['review_1', 'review_1', 'review_1', 'review_2', 'review_2','review_3']}
I am using a nested for-loop to append reviews in listings_df
.
for i, l_row in listings_df.iterrows():
review_list = []
for j, r_row in reviews_df.iterrows():
if r_row['listing_id'] == l_row['id']:
review_list.append(r_row['comments'])
listings_df.at[i, 'reviews'] = review_list
This becomes very time-consuming as both datasets have 100K rows in them. Is there a smarter way to do it?
CodePudding user response:
It looks like you want first to groupby.agg
, then perform a left merge:
out = listing_df.merge(reviews_df.groupby('listing_id').agg(list),
left_on='id', right_index=True, how='left')
Output:
id host_name reviews
0 1 adasd [review_1, review_1]
1 2 dasda [review_2, review_2]
2 3 asdasd [review_3]
3 4 xyz NaN