I am working on a big dataset with a lot of rows and columns. It has approximately the following structure:
myData <- structure(list(ID = c("a", "b", "c", "d", "e"),
group = c("x", "y", "x", "x", "y"),
var1 = c(0L, 0L, 2L, 0L, 0L), var2 = c(0,
0, 1, 0, 1), var3 = c(1.032, 0, 0, 0, 0)), row.names = c(NA,
5L), class = "data.frame")
> myData
ID group var1 var2 var3
1 a x 0 0 1.032
2 b y 0 0 0.000
3 c x 2 1 0.000
4 d x 0 0 0.000
5 e y 0 1 0.000
I need to create two variables for each variable varN
: varN_mean
, containing the mean of each group
, and varN_dev
, containing the deviation from the mean of the individual observation from each group. The final data should look approximately like this:
> myData
ID group var1_mean var1_dev var2_mean var2_dev var3_mean var3_dev
1 a x 0.667 -0.667 0.333 -0.333 0.344 0.688
2 b y 0 0 0.5 -0.5 0 0
3 c x 0.667 1.333 0.333 0.667 0.344 -0.344
4 d x 0.667 -0.667 0.333 -0.333 0.344 -0.344
5 e y 0 0 0.5 0.5 0 0
The order of the columns is not important.
I got until here (with help from a colleague) with dplyr
:
library(dplyr)
means_myData <-
myData %>%
group_by(group) %>%
summarise(
ID,
across(starts_with("var"),
.fns=~mean(.x),
.names="{col}_mean")
) %>%
ungroup()
But now I am struggling to compute the deviations. I think I should something like this:
means_myData %>%
full_join(myData, by = c("ID", "group")) %>%
summarise(
ID,
across(matches("^var._mean$"), .fns-~.x - ...)
)
but I am not sure how to match the columns with their respective means in the expression.
How can I create the variables automatically for the deviations and compute the deviations from the mean for each group?
Thanks!
CodePudding user response:
Use across
's list
option to compute both variables at once:
library(dplyr)
myData %>%
group_by(group) %>%
mutate(across(starts_with("var"),
list(mean = mean, dev = ~ .x - mean(.x)),
.names = "{.col}_{.fn}"),
.keep = "unused")
ID group var1_mean var1_dev var2_mean var2_dev var3_mean var3_dev
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a x 0.667 -0.667 0.333 -0.333 0.344 0.688
2 b y 0 0 0.5 -0.5 0 0
3 c x 0.667 1.33 0.333 0.667 0.344 -0.344
4 d x 0.667 -0.667 0.333 -0.333 0.344 -0.344
5 e y 0 0 0.5 0.5 0 0