I want to calculate the average of several variables but capping and flooring its values.
For example in mtcars, calculate the average of drat and wt variables, but capping any value above 3. So if drat is 3.90, for the means only 3 is used.
First row would be (3 2.62)/2 instead of (3.9 2.62)/2.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This is uncapped. How would I cap it to 3? Is there any way to make it with rowMeans?
mtcars = data.table(mtcars)
mtcars[,max_dart_wt := rowMeans(.SD), .SDcols = c("drat","wt")]
CodePudding user response:
Here is smth quick and dirty:
rowMeans_p <- function(DT, p = 3) {
tmp <- lapply(DT, \(x) pmin(x, p)) |>
do.call(what = " ")
tmp / ncol(DT)
}
mtcars[, max_dart_wt := rowMeans_p(.SD), .SDcols = c("drat", "wt")]
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb max_dart_wt
# <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2.8100
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2.9375
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2.6600
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.0000
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.0000
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2.8800
CodePudding user response:
I realise the fastest way is to create an index and do the operations per "id" if you have few columns.
mtcars$idx = 1:nrow(mtcars)
mtcars[,max_dart_wt := mean(c(min(drat,3),min(wt,3)),na.rm=TRUE), by = "idx"]