Calculate mean of every 2nd column in a dataframe-CodePudding

I would like to calculate the row-wise means of every second column in my dataset, meaning: Average of columns A and B, C and D, E and F. My data look like this:

|A|B|C|D|E|F|
|-|-|-|-|-|-|
|0|1|1|1|0|1|
|0|0|1|1|0|0|
|1|1|0|0|0|1|
|0|1|1|1|1|1|
|1|1|1|1|0|1|

As a condition I want to include that both values should be greater 0 to compute the mean:

data$meanAB <-if_else(A > 0 & B > 0, rowMeans(data[,1:2]), 0)

I manage to do this for two columns, but I would like a solution with new columns added to my dataframe with the rowwise means of every 2 columns. I want to end up with a table like this:

|A|B|C|D|E|F|meanAB|meanCD|meanEF|
|-|-|-|-|-|-|-|-|-|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|

Thanks in advance!

CodePudding user response：

Base R option using split.default -

cbind(df, sapply(split.default(df, ceiling(seq_along(df)/2)), function(x) {
  ifelse(x[1] > 0 & x[2] > 0, rowMeans(x), 0)
}))

#  A B C D E F 1 2 3
#1 0 1 1 1 0 1 0 1 0
#2 0 0 1 1 0 0 0 1 0
#3 1 1 0 0 0 1 1 0 0
#4 0 1 1 1 1 1 0 1 1
#5 1 1 1 1 0 1 1 1 0

where column 1 is mean of A & B, column 2 is mean of C & D and so on.

CodePudding user response：

Here is a way. It uses a cumsum trick to get groups of columns, two by two. Then it loops through the split data and computes the row means. Finally, it combines the output with the original input data.

cs <- cumsum(seq_len(ncol(data)) %% 2)
res <- lapply(split(as.list(data), cs), \(x){
  rowMeans(as.data.frame(x))
})
res <- do.call(cbind, res)
colnames(res) <- paste0("mean", tapply(names(data), cs, paste, collapse = ""))
cbind(data, res)
#  A B C D E F meanAB meanCD meanEF
#1 0 1 1 1 0 1    0.5      1    0.5
#2 0 0 1 1 0 0    0.0      1    0.0
#3 1 1 0 0 0 1    1.0      0    0.5
#4 0 1 1 1 1 1    0.5      1    1.0
#5 1 1 1 1 0 1    1.0      1    0.5

Data in `dput` format

data <-
structure(list(A = c(0L, 0L, 1L, 0L, 1L), B = c(1L, 0L, 1L, 1L, 
1L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 1L, 0L, 1L, 1L), E = c(0L, 
0L, 0L, 1L, 0L), F = c(1L, 0L, 1L, 1L, 1L)), row.names = c(NA, 
-5L), class = "data.frame")

CodePudding user response：

A tidyversesolution would be. To me this is very short and neat.

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data <-
  structure(
    list(
      A = c(0L, 0L, 1L, 0L, 1L),
      B = c(1L, 0L, 1L, 1L, 1L),
      C = c(1L, 1L, 0L, 1L, 1L),
      D = c(1L, 1L, 0L, 1L, 1L),
      E = c(0L, 0L, 0L, 1L, 0L),
      F = c(1L, 0L, 1L, 1L, 1L)
    ),
    row.names = c(NA, -5L),
    class = "data.frame"
  )

data %>%
  rowwise() %>%
  mutate(meanAB = mean(c(A, B)),
         meanCD = mean(c(C, D)),
         meanEF = mean(c(E, F)))
#> # A tibble: 5 x 9
#> # Rowwise: 
#>       A     B     C     D     E     F meanAB meanCD meanEF
#>   <int> <int> <int> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#> 1     0     1     1     1     0     1    0.5      1    0.5
#> 2     0     0     1     1     0     0    0        1    0  
#> 3     1     1     0     0     0     1    1        0    0.5
#> 4     0     1     1     1     1     1    0.5      1    1  
#> 5     1     1     1     1     0     1    1        1    0.5

^{Created on 2021-10-27 by the reprex package (v2.0.1)}

Data in dput format

Data in `dput` format