Home > front end >  Replace certain values of one column, with different values from a different df, pandas
Replace certain values of one column, with different values from a different df, pandas

Time:04-19

I have a df, for example -

df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
                   'age': [21, 23, 24, 28],
                   'occupation': ['data scientist',  'doctor',  'data analyst', 'engineer'],
                   'knowledge':['python', 'medical','sql','c  '],
                  })

and another df -

df2 = pd.DataFrame({'occupation': ['data scientist', 'data analyst'],
                   'knowledge':['5', '4'],
                  })

I want to replace the knowledge values of the first DF with the knowledge values of the second, but only for the rows which are the same. making the first DF look like that:

df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
                   'age': [21, 23, 24, 28],
                   'occupation': ['data scientist',  'doctor',  'data analyst', 'engineer'],
                   'knowledge':['5', 'medical','4','c  '],
                  })

I tried to do stuff with replace, but it didn't work...

CodePudding user response:

You may try this:

occ_know_dict = df2.set_index('occupation').to_dict()['knowledge']

df['knowledge'] = df[['knowledge','occupation']].apply(
        lambda row: occ_know_dict[row['occupation']] if row['occupation'] in occ_know_dict else row['knowledge'], axis=1)

CodePudding user response:

You can try map the corresponding knowledge column which shares the same occupation of df2 to df1 then update the value to df.

df['knowledge'].update(df['occupation'].map(df2.set_index('occupation')['knowledge']))

Note that update happens inplace.

print(df)

    name  age      occupation knowledge
0  name1   21  data scientist         5
1  name2   23          doctor   medical
2  name3   24    data analyst         4
3  name4   28        engineer       c  
  • Related