Home > Net >  pandas repeat ids based on matching column value and index
pandas repeat ids based on matching column value and index

Time:05-24

I have two dataframes like as below

row_id,var_name,score
1,revenue,10
1,cnt_days,5
1,age,15
2,revenue,11
2,cnt_days,3
2,age,25

df1 = pd.read_clipboard(sep=',')

unique_key,status,country,marital_status     
 123,passed,UK,Single
 456,failed,USA,Married
 789,passed,KOREA,Single

df2 = pd.read_clipboard(sep=',')

I would like to do the below

a) Match df1 and df2 based on df1['row_id'] and df2.index

b) For matching records, attach unique_key to df1

So, I tried the below

pd.concat([df1, df2[['unique_key']].reset_index(drop=True)], axis=1)

but the above doesn't work for repeating row_ids. It only matches for first occurrence of row_ids.

How can I do this for repeat occurrence of row_ids in big data dataframe?

I expect my output to be like as below

enter image description here

CodePudding user response:

Because match by indices in df2 is possible use Series.map by Series - df2['unique_key']:

df1['unique_key'] = df1['row_id'].map(df2['unique_key'])
print (df1)
   row_id  var_name  score  unique_key
0       1   revenue     10         456
1       1  cnt_days      5         456
2       1       age     15         456
3       2   revenue     11         789
4       2  cnt_days      3         789
5       2       age     25         789

Or left join with only column unique_key with parameters left_on and right_index:

df = df1.merge(df2['unique_key'], left_on='row_id', right_index=True, how='left')
print (df)
   row_id  var_name  score  unique_key
0       1   revenue     10         456
1       1  cnt_days      5         456
2       1       age     15         456
3       2   revenue     11         789
4       2  cnt_days      3         789
5       2       age     25         789

CodePudding user response:

df1['unique_key'] = df1.merge(df2, right_index=True, left_on='row_id')['unique_key']
print(df1)

   row_id  var_name  score  unique_key
0       1   revenue     10         456
1       1  cnt_days      5         456
2       1       age     15         456
3       2   revenue     11         789
4       2  cnt_days      3         789
5       2       age     25         789
  • Related