Pandas replace columns by merging another dataframe-CodePudding

I have a dataframe df1 looks like this:

     id  A   B
0    1   10  5
1    1   11  6
2    2   10  7
3    2   11  8

And another dataframe df2:

     id  A
0    1   3
1    2   4

Now I want to replace A column in df1 with the value of A in df2 based on id, so the result should look like this:

     id  A   B
0    1   3   5
1    1   3   6
2    2   4   7
3    2   4   8

There's a way that I can drop column A in df1 first and merge df2 to df1 on id like df1 = df1.drop(['A'], axis=1).merge(df2, how='left', on='id'), but if there're like 10 columns in df2, it will be pretty hard. Is there a more elegant way to do so?

CodePudding user response：

here is one way to do it, by making use of pd.update. However, it requires to set the index on the id, so it can match the two df

df.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df.update(df2)
df['A'] = df['A'].astype(int)  # value by default was of type float
df.reset_index()

    id  A   B
0   1   3   5
1   1   3   6
2   2   4   7
3   2   4   8

CodePudding user response：

Merge just the id column from df to df2, and then combine_first it to the original DataFrame:

df = df[['id']].merge(df2).combine_first(df)
print(df)

Output:

   A  B  id
0  3  5   1
1  3  6   1
2  4  7   2
3  4  8   2

CodePudding user response：

instead of dropping and merging, set the column based on a list built by iterating through df2

tmp.csv

id,A,B
1,3,5
1,3,6
2,4,7
2,4,8

tmp2.csv

id,A
1,3
2,4

code

import pandas as pd

df1 = pd.read_csv('tmp.csv')
df2 = pd.read_csv('tmp2.csv')

print('df1\n', df1)
print('df2\n', df2)

df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]
print('result\n', df1)

output

df1
    id  A  B
0   1  3  5
1   1  3  6
2   2  4  7
3   2  4  8
df2
    id  A
0   1  3
1   2  4
result
    id  A  B
0   1  3  5
1   1  3  6
2   2  4  7
3   2  4  8

breakdown of whats happening

we set the "A" col of df1 by using list comprehension

df1["A"] = [ ... ]

the content of that list is based on iterating through df1 (to get the id we need to correlate to df2's "A" col)

... for i,row in df1.iterrows() ]

note that row['id'] will refer to the id in df1 we want to get a value for from df2

so now we get that value from df2

... df2.loc[df2['id'] == row['id']].iloc[0]['A'] ...

the reason we iloc[0] is because doing df2.loc[df2['id']==row['id]] will return a data frame. realistically, this data frame will only ever have 1 row in it

all together that gives us the 1 liner of list comprehension for setting some column based on how a value in that row correlates to another value in a separate df

df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]