I have a df, for example -
df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
'age': [21, 23, 24, 28],
'occupation': ['data scientist', 'doctor', 'data analyst', 'engineer'],
'knowledge':['python', 'medical','sql','c '],
})
and another df -
df2 = pd.DataFrame({'occupation': ['data scientist', 'data analyst'],
'knowledge':['5', '4'],
})
I want to replace the knowledge values of the first DF with the knowledge values of the second, but only for the rows which are the same. making the first DF look like that:
df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
'age': [21, 23, 24, 28],
'occupation': ['data scientist', 'doctor', 'data analyst', 'engineer'],
'knowledge':['5', 'medical','4','c '],
})
I tried to do stuff with replace, but it didn't work...
CodePudding user response:
You may try this:
occ_know_dict = df2.set_index('occupation').to_dict()['knowledge']
df['knowledge'] = df[['knowledge','occupation']].apply(
lambda row: occ_know_dict[row['occupation']] if row['occupation'] in occ_know_dict else row['knowledge'], axis=1)
CodePudding user response:
You can try map
the corresponding knowledge
column which shares the same occupation
of df2
to df1
then update
the value to df
.
df['knowledge'].update(df['occupation'].map(df2.set_index('occupation')['knowledge']))
Note that update
happens inplace.
print(df)
name age occupation knowledge
0 name1 21 data scientist 5
1 name2 23 doctor medical
2 name3 24 data analyst 4
3 name4 28 engineer c