Home > Blockchain >  Pandas map many to one instead of merge without dropping duplicates?
Pandas map many to one instead of merge without dropping duplicates?

Time:07-18

I have two dataframes like as below

data_df = pd.DataFrame({'person_id': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'],
             'company': ['a','a','a','a','a','a','a'],
             'dept_access':['a1','a1','a1','a1','a2','a2','a2']})

key_df = pd.DataFrame({'p_id': ['[email protected]','[email protected]','[email protected]'],
             'company': ['a','a','a'],
             'location':['UK','USA','KOREA']})

My objective is to do the below

a) Attach location column from key df to data df

So, I tried the merge option like below

data_df.merge(key_df,left_on='person_id',right_on='p_id',how='left')

But this results in more records than original data_df because of duplicates in merging column.

So, therefore, I would like to use map approach as I know my key_df will have one unique_key for each user. So, I was trying something like below

s = key_df.set_index(['p_id'])['location']
data_df['location'] = data_df[('person_id')].map(s)

but this doesn't work as well.

I expect my output to have 4 columns as below

enter image description here

CodePudding user response:

In your data second solution working well, in real data is possible remove duplicates first:

s = key_df.drop_duplicates('p_id').set_index(['p_id'])['location']
data_df['location'] = data_df[('person_id')].map(s)
print (data_df)

       person_id company dept_access location
0  [email protected]       a          a1       UK
1  [email protected]       a          a1       UK
2  [email protected]       a          a1       UK
3  [email protected]       a          a1      NaN
4  [email protected]       a          a2      NaN
5  [email protected]       a          a2    KOREA
6  [email protected]       a          a2    KOREA
  • Related