Home > Blockchain >  Group by group calculation of new dataframe column with multiple calculations
Group by group calculation of new dataframe column with multiple calculations

Time:11-30

I have the following data frame.

example_df <- data.frame("Group_name" = c("Group 1", "Group 1", "Group 2", "Group 2", "Group 2"),
                         "Logical_variable" = as.logical(c(F,T,T,F,F)), 
                         "Numeric_variable" = as.numeric(c(1.5e-3, 1, 1, 4e-4, 3e-6)))

From this data frame I want to make a new column example_df$new_col that uses different calculations based on the Logical_variable and does these calculations on a group by group basis.

If the logical is false I need the numeric_value in that row divided by sum of numeric variables that have a false logical value (on a group by group basis) to be returned to the new column. So for Group 1 (first row) it would be 1.5e-3/1.5e-3 resulting in 1. For row 4 (Group 2) you would get 4e-4/(4e-4 3e-6) resulting in 0.9925558 and for row 5 3e-6/(3e-6 4e-4) resulting in 0.007444169.

For the true logical values I need 1/(1 false logical values for each group) to be returned to the new column. So for row 2 which is in Group 1 it would be 1/(1 1.5e-3) = 0.9985022 and for row 3 it would be 1/(1 4e-4 3e-6) = 0.9995972.

So the resulting column would be example_df$new_col <- c(1, 0.9985022, 0.9995972, 0.9925558, 0.007444169)

What would be the best way to get to this result? For my actual application there may be many groups and so doing these calculations group by group might be important.

CodePudding user response:

With data.table, working the logic as a mathematical expression:

library(data.table)

setDT(example_df)[, new_col := (Logical_variable   (1 - Logical_variable)*Numeric_variable)/(Logical_variable   sum(Numeric_variable[!Logical_variable])), Group_name]

#>    Group_name Logical_variable Numeric_variable     new_col
#> 1:    Group 1            FALSE          1.5e-03 1.000000000
#> 2:    Group 1             TRUE          1.0e 00 0.998502247
#> 3:    Group 2             TRUE          1.0e 00 0.999597162
#> 4:    Group 2            FALSE          4.0e-04 0.992555831
#> 5:    Group 2            FALSE          3.0e-06 0.007444169
  •  Tags:  
  • r
  • Related