I have a dataframe df1
looks like this:
id A B
0 1 10 5
1 1 11 6
2 2 10 7
3 2 11 8
And another dataframe df2
:
id A
0 1 3
1 2 4
Now I want to replace A
column in df1
with the value of A
in df2
based on id
, so the result should look like this:
id A B
0 1 3 5
1 1 3 6
2 2 4 7
3 2 4 8
There's a way that I can drop column A
in df1
first and merge df2
to df1
on id
like df1 = df1.drop(['A'], axis=1).merge(df2, how='left', on='id')
, but if there're like 10 columns in df2
, it will be pretty hard. Is there a more elegant way to do so?
CodePudding user response:
here is one way to do it, by making use of pd.update. However, it requires to set the index on the id, so it can match the two df
df.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df.update(df2)
df['A'] = df['A'].astype(int) # value by default was of type float
df.reset_index()
id A B
0 1 3 5
1 1 3 6
2 2 4 7
3 2 4 8
CodePudding user response:
Merge just the id
column from df
to df2
, and then combine_first
it to the original DataFrame:
df = df[['id']].merge(df2).combine_first(df)
print(df)
Output:
A B id
0 3 5 1
1 3 6 1
2 4 7 2
3 4 8 2
CodePudding user response:
instead of dropping and merging, set the column based on a list built by iterating through df2
tmp.csv
id,A,B
1,3,5
1,3,6
2,4,7
2,4,8
tmp2.csv
id,A
1,3
2,4
code
import pandas as pd
df1 = pd.read_csv('tmp.csv')
df2 = pd.read_csv('tmp2.csv')
print('df1\n', df1)
print('df2\n', df2)
df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]
print('result\n', df1)
output
df1
id A B
0 1 3 5
1 1 3 6
2 2 4 7
3 2 4 8
df2
id A
0 1 3
1 2 4
result
id A B
0 1 3 5
1 1 3 6
2 2 4 7
3 2 4 8
breakdown of whats happening
we set the "A" col of df1 by using list comprehension
df1["A"] = [ ... ]
the content of that list is based on iterating through df1 (to get the id we need to correlate to df2's "A" col)
... for i,row in df1.iterrows() ]
note that row['id'] will refer to the id in df1 we want to get a value for from df2
so now we get that value from df2
... df2.loc[df2['id'] == row['id']].iloc[0]['A'] ...
the reason we iloc[0] is because doing df2.loc[df2['id']==row['id]]
will return a data frame.
realistically, this data frame will only ever have 1 row in it
all together that gives us the 1 liner of list comprehension for setting some column based on how a value in that row correlates to another value in a separate df
df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]