Home > Software engineering >  R data table average with condition
R data table average with condition

Time:02-03

I want to calculate the average of several variables but capping and flooring its values.

For example in mtcars, calculate the average of drat and wt variables, but capping any value above 3. So if drat is 3.90, for the means only 3 is used.

First row would be (3 2.62)/2 instead of (3.9 2.62)/2.

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

This is uncapped. How would I cap it to 3? Is there any way to make it with rowMeans?

mtcars = data.table(mtcars)
mtcars[,max_dart_wt := rowMeans(.SD), .SDcols = c("drat","wt")]

CodePudding user response:

Here is smth quick and dirty:

rowMeans_p <- function(DT, p = 3) {
  tmp <- lapply(DT, \(x) pmin(x, p)) |> 
    do.call(what = " ")
  tmp / ncol(DT)
}

mtcars[, max_dart_wt  := rowMeans_p(.SD), .SDcols = c("drat", "wt")]


head(mtcars)
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb max_dart_wt
#    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>       <num>
# 1:  21.0     6   160   110  3.90 2.620 16.46     0     1     4     4      2.8100
# 2:  21.0     6   160   110  3.90 2.875 17.02     0     1     4     4      2.9375
# 3:  22.8     4   108    93  3.85 2.320 18.61     1     1     4     1      2.6600
# 4:  21.4     6   258   110  3.08 3.215 19.44     1     0     3     1      3.0000
# 5:  18.7     8   360   175  3.15 3.440 17.02     0     0     3     2      3.0000
# 6:  18.1     6   225   105  2.76 3.460 20.22     1     0     3     1      2.8800

CodePudding user response:

I realise the fastest way is to create an index and do the operations per "id" if you have few columns.

mtcars$idx = 1:nrow(mtcars)
mtcars[,max_dart_wt := mean(c(min(drat,3),min(wt,3)),na.rm=TRUE), by = "idx"]
  • Related