Home > front end >  Pandas : Performing an operation using two dataframes
Pandas : Performing an operation using two dataframes

Time:05-11

I have two dataframes as shown below:

df1 = 

                        A     B     C      D       E       F 
timestamp               
2022-04-08 10:07:00     40    50    NaN    50     NaN    NaN
2022-04-08 10:07:01     70    80    80     10     NaN    NaN
2022-04-08 10:07:02     200   220   NaN    10     10     NaN


df2 = 

             A_1       B_1    
                
C            10        10
D            20        10   
E            30        20 
F            5         15

As you can see, the columns C,D,E and F of df1 are index of df2. I want to perform an operation as follows: For each row in df1, the values in columns C D E and F are checked. If there is a value present in one of the columns, the name of the column is noted. Then the corresponding values for columns A and B are taken. These values are added with the values in A_1 and B_1 of df2 for that particular index (which was noted earlier) and this sum is updated back in df1. The below table might explain the problem better. The expected output is as follows:

df1 = 

                        A     B     C      D       E       F    Sum_C  Sum_D   Sum_E   Sum_F
timestamp               
2022-04-08 10:07:00     40    50    NaN    50     NaN    NaN    NaN    120     NaN     NaN
2022-04-08 10:07:01     70    80    80     10     NaN    NaN   170     180     NaN     NaN
2022-04-08 10:07:02     200   220   NaN    10     10     NaN    NaN    450     470     NaN

Here,

In the first row of df1, there is a value for column D. The values from column A and B are 40 and 50. Now, since there is a value present in column D of df1, the index D of df2 is selected, the corresponding A_1 and B_1 is noted which in this case is 20 and 10. All of these values (A B A_1 B_1) is added resulting in a value of 120. This is added in Sum_D column of df1. The rest of the values (Sum_C,Sum_E,Sum_F) are NaN.

Would appreciate if there is an easier way to do this.

Thanks in advance!

CodePudding user response:

IIUC, you can use:

d = df1[df2.index].notna()
out = (df1
       .join(d.mul(df2.sum(1)) # add sum from df2
              .add(df1[['A', 'B']].sum(1), axis=0) # add data from A/B
              .where(d)                            # keep only original non-NAN
              .add_prefix('Sum_')
            )
       )

output:

                       A    B     C   D     E   F  Sum_C  Sum_D  Sum_E  Sum_F
timestamp                                                                    
2022-04-08 10:07:00   40   50   NaN  50   NaN NaN    NaN    120    NaN    NaN
2022-04-08 10:07:01   70   80  80.0  10   NaN NaN  170.0    180    NaN    NaN
2022-04-08 10:07:02  200  220   NaN  10  10.0 NaN    NaN    450  470.0    NaN
  • Related