Home > Mobile >  Merge Pandas dataframe
Merge Pandas dataframe

Time:12-09

I want to merge 2 pandas data frames. I am working on the Airbnb dataset with listings and reviews data frame for a city.

listings_df = {'id':[1,2,3,4],
              'host_name':['adasd','dasda', 'asdasd']}
reviews_df = {'listing_id':[1,1,2,2,3],
              'reviews':['review_1', 'review_1', 'review_1', 'review_2', 'review_2','review_3']}

I am using a nested for-loop to append reviews in listings_df.

for i, l_row in listings_df.iterrows():
    review_list = []
    for j, r_row in reviews_df.iterrows():
        if r_row['listing_id'] == l_row['id']:
            review_list.append(r_row['comments'])
    listings_df.at[i, 'reviews'] = review_list

This becomes very time-consuming as both datasets have 100K rows in them. Is there a smarter way to do it?

CodePudding user response:

It looks like you want first to groupby.agg, then perform a left merge:

out = listing_df.merge(reviews_df.groupby('listing_id').agg(list),
                       left_on='id', right_index=True, how='left')

Output:


   id host_name               reviews
0   1     adasd  [review_1, review_1]
1   2     dasda  [review_2, review_2]
2   3    asdasd            [review_3]
3   4       xyz                   NaN
  • Related