Calculate deviations from group means for each column-CodePudding

I am working on a big dataset with a lot of rows and columns. It has approximately the following structure:

myData <- structure(list(ID = c("a", "b", "c", "d", "e"),
group = c("x", "y", "x", "x", "y"),
var1 = c(0L, 0L, 2L, 0L, 0L), var2 = c(0, 
0, 1, 0, 1), var3 = c(1.032, 0, 0, 0, 0)), row.names = c(NA, 
5L), class = "data.frame")

> myData
  ID group var1 var2  var3
1  a     x    0    0 1.032
2  b     y    0    0 0.000
3  c     x    2    1 0.000
4  d     x    0    0 0.000
5  e     y    0    1 0.000

I need to create two variables for each variable varN: varN_mean, containing the mean of each group, and varN_dev, containing the deviation from the mean of the individual observation from each group. The final data should look approximately like this:

> myData
  ID group var1_mean  var1_dev var2_mean var2_dev var3_mean var3_dev
1  a     x     0.667   -0.667     0.333    -0.333     0.344    0.688
2  b     y         0        0       0.5      -0.5         0        0
3  c     x     0.667    1.333     0.333     0.667     0.344   -0.344
4  d     x     0.667   -0.667     0.333    -0.333     0.344   -0.344
5  e     y         0        0       0.5       0.5         0        0

The order of the columns is not important.

I got until here (with help from a colleague) with dplyr:

library(dplyr)

means_myData <- 
    myData %>%
      group_by(group) %>%
      summarise(
        ID,
        across(starts_with("var"),
          .fns=~mean(.x),
          .names="{col}_mean")
      ) %>%
      ungroup()

But now I am struggling to compute the deviations. I think I should something like this:

means_myData %>%
  full_join(myData, by = c("ID", "group")) %>%
  summarise(
    ID,
    across(matches("^var._mean$"), .fns-~.x - ...)
  )

but I am not sure how to match the columns with their respective means in the expression.

How can I create the variables automatically for the deviations and compute the deviations from the mean for each group?

Thanks!

CodePudding user response：

Use across's list option to compute both variables at once:

library(dplyr)
myData %>% 
  group_by(group) %>% 
  mutate(across(starts_with("var"), 
                list(mean = mean, dev = ~ .x - mean(.x)), 
                .names = "{.col}_{.fn}"),
         .keep = "unused")

  ID    group var1_mean var1_dev var2_mean var2_dev var3_mean var3_dev
  <chr> <chr>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>
1 a     x         0.667   -0.667     0.333   -0.333     0.344    0.688
2 b     y         0        0         0.5     -0.5       0        0    
3 c     x         0.667    1.33      0.333    0.667     0.344   -0.344
4 d     x         0.667   -0.667     0.333   -0.333     0.344   -0.344
5 e     y         0        0         0.5      0.5       0        0