I would like to calculate the row-wise means of every second column in my dataset, meaning: Average of columns A and B, C and D, E and F. My data look like this:
|A|B|C|D|E|F|
|-|-|-|-|-|-|
|0|1|1|1|0|1|
|0|0|1|1|0|0|
|1|1|0|0|0|1|
|0|1|1|1|1|1|
|1|1|1|1|0|1|
As a condition I want to include that both values should be greater 0 to compute the mean:
data$meanAB <-if_else(A > 0 & B > 0, rowMeans(data[,1:2]), 0)
I manage to do this for two columns, but I would like a solution with new columns added to my dataframe with the rowwise means of every 2 columns. I want to end up with a table like this:
|A|B|C|D|E|F|meanAB|meanCD|meanEF|
|-|-|-|-|-|-|-|-|-|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|
Thanks in advance!
CodePudding user response:
Base R option using split.default
-
cbind(df, sapply(split.default(df, ceiling(seq_along(df)/2)), function(x) {
ifelse(x[1] > 0 & x[2] > 0, rowMeans(x), 0)
}))
# A B C D E F 1 2 3
#1 0 1 1 1 0 1 0 1 0
#2 0 0 1 1 0 0 0 1 0
#3 1 1 0 0 0 1 1 0 0
#4 0 1 1 1 1 1 0 1 1
#5 1 1 1 1 0 1 1 1 0
where column 1 is mean of A & B, column 2 is mean of C & D and so on.
CodePudding user response:
Here is a way. It uses a cumsum
trick to get groups of columns, two by two. Then it loops through the split data and computes the row means. Finally, it combines the output with the original input data.
cs <- cumsum(seq_len(ncol(data)) %% 2)
res <- lapply(split(as.list(data), cs), \(x){
rowMeans(as.data.frame(x))
})
res <- do.call(cbind, res)
colnames(res) <- paste0("mean", tapply(names(data), cs, paste, collapse = ""))
cbind(data, res)
# A B C D E F meanAB meanCD meanEF
#1 0 1 1 1 0 1 0.5 1 0.5
#2 0 0 1 1 0 0 0.0 1 0.0
#3 1 1 0 0 0 1 1.0 0 0.5
#4 0 1 1 1 1 1 0.5 1 1.0
#5 1 1 1 1 0 1 1.0 1 0.5
Data in dput
format
data <-
structure(list(A = c(0L, 0L, 1L, 0L, 1L), B = c(1L, 0L, 1L, 1L,
1L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 1L, 0L, 1L, 1L), E = c(0L,
0L, 0L, 1L, 0L), F = c(1L, 0L, 1L, 1L, 1L)), row.names = c(NA,
-5L), class = "data.frame")
CodePudding user response:
A tidyverse
solution would be. To me this is very short and neat.
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data <-
structure(
list(
A = c(0L, 0L, 1L, 0L, 1L),
B = c(1L, 0L, 1L, 1L, 1L),
C = c(1L, 1L, 0L, 1L, 1L),
D = c(1L, 1L, 0L, 1L, 1L),
E = c(0L, 0L, 0L, 1L, 0L),
F = c(1L, 0L, 1L, 1L, 1L)
),
row.names = c(NA, -5L),
class = "data.frame"
)
data %>%
rowwise() %>%
mutate(meanAB = mean(c(A, B)),
meanCD = mean(c(C, D)),
meanEF = mean(c(E, F)))
#> # A tibble: 5 x 9
#> # Rowwise:
#> A B C D E F meanAB meanCD meanEF
#> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 0 1 1 1 0 1 0.5 1 0.5
#> 2 0 0 1 1 0 0 0 1 0
#> 3 1 1 0 0 0 1 1 0 0.5
#> 4 0 1 1 1 1 1 0.5 1 1
#> 5 1 1 1 1 0 1 1 1 0.5
Created on 2021-10-27 by the reprex package (v2.0.1)