Home > front end >  Aggregating every n columns in dplyr
Aggregating every n columns in dplyr

Time:08-20

I have a dataframe like this -

  A1_A A1_B A1_C A1_D B1_A B1_B B1_C B1_D C1_A C1_B C1_C C1_D 
1 0.86  0.9 0.75 0.65 0.12 0.35 0.45 0.44  0.2  0.4  0.6  0.7
2 ...
3 ...

I am trying to computing the mean of every 2 columns for every row using dplyr. The expected output:-

  A1_A A1_C B1_A  B1_C C1_A C1_C
1 0.88 0.70 0.23 0.445  0.3 0.65
2 ...
3 ...

I understand rowsums in basic R can do something similar. Is there a way to do it in dplyr?

Although, in the example here n=2, my actual data varies dynamically and n can vary accordingly. A generalized method to aggregate row-wise data from n columns is required.

Thanks!

Example for n=3

df <- structure(list(A1_A = 0.86, A1_B = 0.9, A1_C = 0.75, A1_D = 0.65,      
                     A1_E = 0.6, A1_F = 0.65, B1_A = 0.12, B1_B = 0.35, 
                     B1_C = 0.45, B1_D = 0.44, B1_E = 0.5, B1_F = 0.55, 
                     C1_A = 0.2, C1_B = 0.4,      C1_C = 0.6, C1_D = 0.7, 
                     C1_E = 0.75, C1_F = 0.8), class = "data.frame", row.names = "1") 

Output:

#   A1_A A1_D B1_A B1_D C1_A C1_D 
# 1 0.84 0.63 0.31  0.5  0.4 0.75

CodePudding user response:

An easy way is to sum up the odd and even columns and divide it by 2.

(df[seq(1, ncol(df), 2)]   df[seq(2, ncol(df), 2)])/2

#   A1_A A1_C  B1_A  B1_C C1_A C1_C
# 1 0.88  0.7 0.235 0.445  0.3 0.65

Generalization
  • n = 3
n <- 3
unname(sapply(seq(1, ncol(df), n), \(x) rowMeans(df[x:(x (n-1))])))

# [1] 0.8366667 0.6333333 0.3066667 0.4966667 0.4000000 0.7500000
  • Related