I have two Dataframes. The first one, "df1", looks as follows:
df1:
b1 b2 b3 b4
a1 0 0 0 0
a2 0 0 0 0
a3 0 0 0 0
a4 0 0 0 0
Note, a (indices) and b (name of the columns) are numerical values.
I want to fill this dataframe with values from another one ("df2"), whereby where the value should go in df1 is contained in df2.
df2 looks something like this:
df2:
namecol1 namecol2 namecol3
indexrow1 b1 a1 x1
indexrow2 b1 a3 x2
indexrow3 b1 a4 x3
indexrow4 b3 a2 y1
indexrow5 b3 a4 y2
Note, x and y are arbitrary numerical values.
Now, I want to overwrite the 0's in df1 with the values from namecol3 at the positions contained in namecol1 (=b values) and namecol2 (=a values) in the following manner:
df1:
b1 b2 b3 b4
a1 x1 0 0 0
a2 0 0 y1 0
a3 x2 0 0 0
a4 x3 0 y2 0
How do I do that? Plus, the real dataframes can have several million datapoints, so a solution that's also performance oriented would be nice.
Any help is much appreciated!
CodePudding user response:
A simple method would be to pivot
and combine_first
:
df3 = (df2
.pivot('namecol2', 'namecol1', 'namecol3')
.combine_first(df1)
.rename_axis(index=None, columns=None)
)
or, for in place modification, use update
:
df1.update(df2.pivot('namecol2', 'namecol1', 'namecol3'))
output:
b1 b2 b3 b4
a1 x1 0 0 0
a2 0 0 y1 0
a3 x2 0 0 0
a4 x3 0 y2 0
CodePudding user response:
You can use pivot
:
vals = df2.pivot('namecol2','namecol1', 'namecol3')
df[vals.notna()] = vals
Output:
b1 b2 b3 b4
a1 x1 0 0 0
a2 0 0 y1 0
a3 x2 0 0 0
a4 x3 0 y2 0