I have a dataframe like as below
data_df = pd.DataFrame({'p_id': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'],
'company': ['a','b','c','d','e','f','g'],
'dept_access':['a1','a1','a1','a1','a2','a2','a2']})
key_df = pd.DataFrame({'p_id': ['[email protected]','[email protected]','[email protected]'],
'company': ['a','c','b'],
'location':['UK','USA','KOREA']})
I would like to do the below
a) Attach the location
column from key_df
to data_df
based on two fields - p_id
and company
So, I tried the below
loc = key_df.drop_duplicates(['p_id','company']).set_index(['p_id','company'])['location']
data_df['location'] = data_df[['p_id','company']].map(loc)
But this resulted in error like below
KeyError: "None of [Index(['p_id','company'], dtype='object')] are in the [columns]"
How can I map based on multiple index columns? I don't wish to use merge
CodePudding user response:
Merge can be used for a lot, so let's first try to use it:
data_df.merge(key_df, on=['p_id', 'company'], how="left")
p_id company dept_access location
0 [email protected] a a1 UK
1 [email protected] b a1 NaN
2 [email protected] c a1 NaN
3 [email protected] d a1 NaN
4 [email protected] e a2 NaN
5 [email protected] f a2 NaN
6 [email protected] g a2 NaN
You can also do this by mapping the index like this:
idx = ['p_id', 'company']
data_df.assign(location=data_df.set_index(idx).index.map(key_df.set_index(idx)['location']))
p_id company dept_access location
0 [email protected] a a1 UK
1 [email protected] b a1 NaN
2 [email protected] c a1 NaN
3 [email protected] d a1 NaN
4 [email protected] e a2 NaN
5 [email protected] f a2 NaN
6 [email protected] g a2 NaN