Rowwise average over increasing no. of columns using for loop inside mutate : dplyr R-CodePudding

I want to perform something like this.

a <- data.frame(A=c(1,5,9),
                B=c(2,6,10),
                C=c(3,7,11),
                D=c(4,8,12))

a <- a %>% rowwise()
a <- a %>% mutate(mean(c_across(1:2)))
a <- a %>% mutate(mean(c_across(1:3)))
a <- a %>% mutate(mean(c_across(1:4)))

This gives:

A   B   C   D  mean(c_across(1:2)) mean(c_across(1:3)) mean(c_across(1:4))
1   2   3   4                  1.5                   2                 2.5
5   6   7   8                  5.5                   6                 6.5
9   10  11  12                 9.5                  10                10.5

I would like to get the same results using a for loop. I tried this:

a <- data.frame(A=c(1,5,9),
                B=c(2,6,10),
                C=c(3,7,11),
                D=c(4,8,12))

a <- a %>% rowwise()
for(i in 2:4){
  a <- a %>% mutate(mean(c_across(1:i)))
}

But it only shows the results of last value of i=4

A   B   C   D  mean(c_across(1:i))
1   2   3   4                  2.5
5   6   7   8                  6.5
9   10  11  12                10.5

Can anyone explain what is happening? Whenever I use a for loop when using dplyr, I immediately feel like I am doing something wrong. Is there any other better approach to this?

CodePudding user response：

You can use purrr::reduce(or base::Reduce) to do the iteration.

library(tidyverse)

reduce(2:4, ~ mutate(.x, !!paste0("col1to", .y) := mean(c_across(1:.y))), .init = rowwise(a))

# A tibble: 3 x 7
# Rowwise: 
      A     B     C     D col1to2 col1to3 col1to4
  <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
1     1     2     3     4     1.5       2     2.5
2     5     6     7     8     5.5       6     6.5
3     9    10    11    12     9.5      10    10.5

base::Reduce version:

Reduce(\(x, y) mutate(x, !!paste0("col1to", y) := mean(c_across(1:y))), 2:4, init = rowwise(a))

To fix your for loop, you need to set different column name to each new column. Otherwise, every new column will have the same name, i.e. "mean(c_across(1:i))", and overrides the former column.

b <- rowwise(a)
for(i in 2:4) {
  b <- b %>% mutate(!!paste0("col1to", i) := mean(c_across(1:i)))
}

b

Another choice using tidyr::unnest_wider():

a %>%
  rowwise() %>%
  mutate(mean = list(cummean(c_across(1:4))[-1])) %>%
  unnest_wider(mean, names_sep = "_")

CodePudding user response：

Using data.table:

setDT(a)[
  , 
  paste0("col", seq_len(ncol(a)-1)) :=  
    transpose(lapply(transpose(.SD), function(x) cummean(x)[-1]))
]

Using base R you can do something like:

cbind(a, t(apply(a, 1, function(x) cummean(x)[-1])))

CodePudding user response：

Here is another tidyverse option, which also uses purrr. We can iterate through the column names using map in order to select the range of columns and get the mean of the columns selected. Then, we can change the names of the new columns and bind the output back to the original dataframe. Here, I use names(a)[-1] so that the code is more flexible and would work for any other dataframe.

library(tidyverse)

names(a)[-1] %>% 
  map(~ a %>% 
        select(names(a)[1]:.x) %>% 
        rowMeans(.)) %>%
  set_names(paste0("mean_", names(a)[1], "_", names(a)[-1])) %>%
  bind_cols(a, .)

Output

  A  B  C  D mean_A_B mean_A_C mean_A_D
1 1  2  3  4      1.5        2      2.5
2 5  6  7  8      5.5        6      6.5
3 9 10 11 12      9.5       10     10.5