How to calculate the mean for every n vectors from a df creating a new data frame with the results.

I expect to get: column 1: mean (V1,V2), column 2: mean (V3,V4), column 3: mean (V5,V6) ,and so forth

data

df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)

CodePudding user response：

You may try,

dummy <- data.frame(
  v1 = c(1:10),
  v2 = c(1:10),
  v3 = c(1:10),
  v4 = c(1:10),
  v5 = c(1:10),
  v6 = c(1:10)
)
nvec_mean <- function(df, n){
  res <- c()
  m <- matrix(1:ncol(df), ncol = n, byrow = T)
  if (ncol(df) %% n != 0){
    stop()
  }
  for (i in 1:nrow(m)){
    v <- rowMeans(df[,m[i,]])
    res <- cbind(res, v)
  }
  colnames(res) <- c(1:nrow(m))
  res
}
nvec_mean(dummy,3)

       1  2
 [1,]  1  1
 [2,]  2  2
 [3,]  3  3
 [4,]  4  4
 [5,]  5  5
 [6,]  6  6
 [7,]  7  7
 [8,]  8  8
 [9,]  9  9
[10,] 10 10

If you didn't want rowMeans or result is not what you wanted, please let me know.

Simple(?) version

df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)
n = 2

res <- c()
m <- matrix(1:ncol(df), ncol = 2, byrow = T)
for (i in 1:nrow(m)){
  v <- rowMeans(df[,m[i,]])
  res <- cbind(res, v)
}
res

     v  v  v
[1,] 4 16 28
[2,] 5 17 29
[3,] 6 18 30
[4,] 7 19 31
[5,] 8 20 32
[6,] 9 21 33

CodePudding user response：

Here is base R option

n <- 2 # Mean across every n = 2 columns
do.call(cbind, lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx   1)])))
#     [,1] [,2] [,3]
#[1,]    4   16   28
#[2,]    5   17   29
#[3,]    6   18   30
#[4,]    7   19   31
#[5,]    8   20   32
#[6,]    9   21   33

This returns a matrix rather than a data.frame (which makes more sense here since you're dealing with "all-numeric" data).

Explanation: The idea is a non-overlapping sliding window approach. seq(1, ncol(df), by = n) creates the start indices of the columns (here: 1, 3, 5). We then loop over those indices idx and calculate the row means of df[c(idx, idx 1)]. This returns a list which we then cbind into a matrix.

As a minor modifcation, you can also predefine a data.frame with the right dimensions and then skip the do.call(cbind, ...) step by having R do an implicit list to data.frame typecast.

out <- data.frame(matrix(NA, ncol = ncol(df) / 2, nrow = nrow(df)))  
out[] <- lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx   1)]))
#  X1 X2 X3
#1  4 16 28
#2  5 17 29
#3  6 18 30
#4  7 19 31
#5  8 20 32
#6  9 21 33