Paste element of a vector into dplyr function-CodePudding

I have the following dataset:

df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
             a = c(7, 3, 5),
             b = c(5, 8, 1),
             c = c(8, 4, 3))

and this vector:

v <- c("a", "b", "c")

Now I want to create a new dataset and summarise a, b, and c by creating new variables (y_a, y_b, and y_c) that calculate the mean of each variable grouped by year.

The code for doing this is the following:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(y_a = mean(a, na.rm = TRUE),
                y_b = mean(b, na.rm = TRUE),
                y_c = mean(c, na.rm = TRUE))

However, I want to use the vector v to read the respective variable from it and paste in into the summarise function:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[2])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[3])) = mean(as.name(v[1]), na.rm = TRUE))

Doing so, I receive the following error message:

Error: unexpected '=' in "y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) ="

How can I paste the value of a vector in this summarise function so that it works?

CodePudding user response：

To define a new variable on the left hand side, you need := instead of =. Because you create it with paste0, you need !! to inject the expression and make sure that is correctly evaluated. To access existing columns in dplyr with a string stored in a variable, using .data is the easiest way.

library(dplyr)

df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
                   a = c(7, 3, 5),
                   b = c(5, 8, 1),
                   c = c(8, 4, 3))

v <- c("a", "b", "c")

df_x %>% group_by(year) %>% 
  dplyr::summarise(!!paste0("y_", v[1]) := mean(.data[[v[1]]], na.rm = TRUE),
                   !!paste0("y_", v[2]) := mean(.data[[v[1]]], na.rm = TRUE),
                   !!paste0("y_", v[3]) := mean(.data[[v[1]]], na.rm = TRUE))
#> # A tibble: 3 × 4
#>    year   y_a   y_b   y_c
#>   <dbl> <dbl> <dbl> <dbl>
#> 1  2000     5     5     5
#> 2  2001     5     5     5
#> 3  2002     5     5     5

^{Created on 2022-12-21 by the reprex package (v1.0.0)}

CodePudding user response：

Here is a one-liner via base R,

aggregate(. ~ year, cbind.data.frame(year = df_x$year, df_x[v]), FUN = \(i)mean(i, na.rm = TRUE))

  year a        b c
1 2000 5 4.666667 5
2 2001 5 4.666667 5
3 2002 5 4.666667 5