I am trying to overwrite specific rows and columns from one dataframe with a second dataframe rows and columns. I can't give the actual data but I will use a proxy here.
Here is an example and what I have tried:
df1
UID B C D
0 X14 cat red One
1 X26 cat blue Two
2 X99 cat pink One
3 X54 cat pink One
df2
UID B C EX2
0 X14 dog blue coat
1 X88 rat green jacket
2 X99 bat red glasses
3 X29 bat red shoes
What I want to do here is overwrite column B
and C
in df1
with the values in df2
based upon UID
. Therefore in this example X88
and X29
from df2
would not appear in df2
. Also column D
would not be affected and EX2
not
The outcome would looks as such:
df1
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
I looked at this solution : Pandas merge two dataframe and overwrite rows However this appears to only update null values whereas I want an overwrite.
My attempt looked this like:
df = df1.merge(df2.filter(['B', 'C']), on=['B', 'C'], how='left')
For my data this actually doesn't seem to overwrite anything. Please could someone explain why this would not work?
Thanks
CodePudding user response:
One approach could be as follows:
- First, use
df.set_index
to make columnUID
your index (inplace
). - Next, use
df.update
with parameteroverwrite
set toTrue
(also useset_index
here for the "other" df:df2
). This will overwrite all the columns that the two dfs have in common (i.e.B
andC
) based on index matches (i.e. nowUID
). - Finally, restore the standard index using
df.reset_index
.
df1.set_index('UID', inplace=True)
df1.update(df2.set_index('UID'), overwrite=True)
df1.reset_index(inplace=True)
print(df1)
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
CodePudding user response:
You can approach this by using reindex_like
and combine_first
.
Try this :
out = (
df2.set_index("UID")
.reindex_like(df1.set_index("UID"))
.combine_first(df1.set_index("UID"))
.reset_index()
)
# Output :
print(out)
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
CodePudding user response:
Using Update function
df1.set_index('UID', inplace=True)
df2.set_index('UID', inplace=True)
df1.update(df2)
df1.reset_index(inplace=True)
print(df1)
Output
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One