I have two dataframes as shown below:
df1 =
A B C D E F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN
2022-04-08 10:07:01 70 80 80 10 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10 NaN
df2 =
A_1 B_1
C 10 10
D 20 10
E 30 20
F 5 15
As you can see, the columns
C,D,E and F of df1
are index
of df2
. I want to perform an operation as follows:
For each row in df1
, the values in columns C D E and F are checked. If there is a value present in one of the columns, the name of the column is noted. Then the corresponding values for columns A and B
are taken. These values are added with the values in A_1 and B_1 of df2
for that particular index (which was noted earlier) and this sum is updated back in df1
. The below table might explain the problem better. The expected output is as follows:
df1 =
A B C D E F Sum_C Sum_D Sum_E Sum_F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN NaN 120 NaN NaN
2022-04-08 10:07:01 70 80 80 10 NaN NaN 170 180 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10 NaN NaN 450 470 NaN
Here,
In the first row of df1
, there is a value for column D
. The values from column A and B
are 40 and 50. Now, since there is a value present in column D
of df1
, the index D of df2
is selected, the corresponding A_1 and B_1 is noted which in this case is 20 and 10.
All of these values (A B A_1 B_1) is added resulting in a value of 120. This is added in Sum_D
column of df1
. The rest of the values (Sum_C
,Sum_E
,Sum_F
) are NaN
.
Would appreciate if there is an easier way to do this.
Thanks in advance!
CodePudding user response:
IIUC, you can use:
d = df1[df2.index].notna()
out = (df1
.join(d.mul(df2.sum(1)) # add sum from df2
.add(df1[['A', 'B']].sum(1), axis=0) # add data from A/B
.where(d) # keep only original non-NAN
.add_prefix('Sum_')
)
)
output:
A B C D E F Sum_C Sum_D Sum_E Sum_F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN NaN 120 NaN NaN
2022-04-08 10:07:01 70 80 80.0 10 NaN NaN 170.0 180 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10.0 NaN NaN 450 470.0 NaN