I have a dataframe that looks like this:
data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))
01-01-2018 02-01-2018 03-01-2018
1 1.2 -0.1 3.4
2 3.1 2.4 -2.6
3 0.7 4.9 -1.8
4 -0.3 -3.3 0.1
5 2.0 -2.7 0.3
I want to calculate the row means considering only the columns that exceed the total row mean.
data$mn <- apply(data, 1, mean)
01-01-2018 02-01-2018 03-01-2018 mn
1 1.2 -0.1 3.4 1.5000000
2 3.1 2.4 -2.6 0.9666667
3 0.7 4.9 -1.8 1.2666667
4 -0.3 -3.3 0.1 -1.1666667
5 2.0 -2.7 0.3 -0.1333333
In other words, for each row, I want to calculate the average of the values that exceed data$mn
.
My last attempt was:
data$mintensity <- apply(data, 1, function(x) mean(x[x > data$mn]) )
but it was unsuccesful.
CodePudding user response:
Just subset each row by their means in respective rows w
before calculating their means.
w <- c("01-01-2018", "02-01-2018", "03-01-2018") ## define columns
apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1] 3.40 2.75 4.90 -0.10 1.15
Another way is to replace
data points that don't exceed the row means with NA's
before calculating rowMeans
. This is about 30 times faster.
rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1] 3.40 2.75 4.90 -0.10 1.15
Data:
data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1,
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1,
0.3)), class = "data.frame", row.names = c(NA, -5L))