I have the following dataset:
df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
a = c(7, 3, 5),
b = c(5, 8, 1),
c = c(8, 4, 3))
and this vector:
v <- c("a", "b", "c")
Now I want to create a new dataset and summarise a, b, and c by creating new variables (y_a
, y_b
, and y_c
) that calculate the mean of each variable grouped by year.
The code for doing this is the following:
y <- df_x %>% group_by(year) %>% dplyr::summarise(y_a = mean(a, na.rm = TRUE),
y_b = mean(b, na.rm = TRUE),
y_c = mean(c, na.rm = TRUE))
However, I want to use the vector v
to read the respective variable from it and paste in into the summarise function:
y <- df_x %>% group_by(year) %>% dplyr::summarise(as.name(paste0("y_", v[1])) = mean(as.name(v[1]), na.rm = TRUE),
as.name(paste0("y_", v[2])) = mean(as.name(v[1]), na.rm = TRUE),
as.name(paste0("y_", v[3])) = mean(as.name(v[1]), na.rm = TRUE))
Doing so, I receive the following error message:
Error: unexpected '=' in "y <- df_x %>% group_by(year) %>% dplyr::summarise(as.name(paste0("y_", v[1])) ="
How can I paste the value of a vector in this summarise function so that it works?
CodePudding user response:
To define a new variable on the left hand side, you need :=
instead of =
. Because you create it with paste0
, you need !!
to inject the expression and make sure that is correctly evaluated. To access existing columns in dplyr
with a string stored in a variable, using .data
is the easiest way.
library(dplyr)
df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
a = c(7, 3, 5),
b = c(5, 8, 1),
c = c(8, 4, 3))
v <- c("a", "b", "c")
df_x %>% group_by(year) %>%
dplyr::summarise(!!paste0("y_", v[1]) := mean(.data[[v[1]]], na.rm = TRUE),
!!paste0("y_", v[2]) := mean(.data[[v[1]]], na.rm = TRUE),
!!paste0("y_", v[3]) := mean(.data[[v[1]]], na.rm = TRUE))
#> # A tibble: 3 × 4
#> year y_a y_b y_c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 5 5 5
#> 2 2001 5 5 5
#> 3 2002 5 5 5
Created on 2022-12-21 by the reprex package (v1.0.0)
CodePudding user response:
Here is a one-liner via base R,
aggregate(. ~ year, cbind.data.frame(year = df_x$year, df_x[v]), FUN = \(i)mean(i, na.rm = TRUE))
year a b c
1 2000 5 4.666667 5
2 2001 5 4.666667 5
3 2002 5 4.666667 5