Home > Software engineering >  How to calculate row mean from selected columns
How to calculate row mean from selected columns

Time:12-02

I have a dataframe that looks like this:

data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))

  01-01-2018  02-01-2018  03-01-2018
1      1.2       -0.1        3.4
2      3.1        2.4       -2.6
3      0.7        4.9       -1.8
4     -0.3       -3.3        0.1
5      2.0       -2.7        0.3

I want to calculate the row means considering only the columns that exceed the total row mean.

data$mn <- apply(data, 1, mean) 

  01-01-2018 02-01-2018 03-01-2018         mn
1        1.2       -0.1        3.4  1.5000000
2        3.1        2.4       -2.6  0.9666667
3        0.7        4.9       -1.8  1.2666667
4       -0.3       -3.3        0.1 -1.1666667
5        2.0       -2.7        0.3 -0.1333333

In other words, for each row, I want to calculate the average of the values that exceed data$mn.

My last attempt was:

data$mintensity <- apply(data, 1, function(x) mean(x[x > data$mn]) ) 

but it was unsuccesful.

CodePudding user response:

Just subset each row by their means in respective rows w before calculating their means.

w <- c("01-01-2018", "02-01-2018", "03-01-2018")  ## define columns

apply(data[, w], 1, function(x) mean(x[x > mean(x)]))
# [1]  3.40  2.75  4.90 -0.10  1.15

Another way is to replace data points that don't exceed the row means with NA's before calculating rowMeans. This is about 30 times faster.

rowMeans(replace(data, data <= rowMeans(data[, w]), NA), na.rm=TRUE)
# [1]  3.40  2.75  4.90 -0.10  1.15

Data:

data <- structure(list(`01-01-2018` = c(1.2, 3.1, 0.7, -0.3, 2), `02-01-2018` = c(-0.1, 
2.4, 4.9, -3.3, -2.7), `03-01-2018` = c(3.4, -2.6, -1.8, 0.1, 
0.3)), class = "data.frame", row.names = c(NA, -5L))
  • Related