I've been doing some really repetitive data analysis tasks (with lots more in the future), so I'm looking to automate as much of it as I can. I'm pretty new to R, especially creating functions, and I don't have a great understanding of the underlying way that functions work. Basically, I am trying to write a function that will use dyplr to create a summary table (which I will eventually turn into a graph). Here's how I picture it working (I promise my actual tasks are complicated enough to make it worth functionalizing):
make_table <- function(df, group.by) {
table <- df %>%
group_by(.data[[group.by]]) %>%
summarize(hwy = mean(hwy))
return(table)
}
make_table(mpg, "manufacturer")
make_table(mpg, c("manufacturer", "model"))
In the second case, I get error messages; .data[[ ]] doesn't work with a character vector, only a string. So next, I tried this:
make_table_2 <- function(df, group.by) {
table <- df %>%
group_by(
for (i in group.by) {
return(.data[[ i ]])
} ) %>%
summarize(hwy = mean(hwy))
return(table)
}
make_table_2(mpg, c("manufacturer", "model"))
That throws me this error:
Error in `group_by()`:
! Problem adding computed columns.
Caused by error in `mutate()`:
! Problem while computing `..1 = for (... in NULL) NULL`.
Caused by error in `.data[["b"]]`:
! Column `b` not found in `.data`.
Could someone help me understand what I'm doing wrong here/what might work better? I imagine this problem might arise because .data doesn't work inside a for loop, but I'm not sure how I would get around that. Thank you so much for any help! Apologies if someone else has asked this question already; I couldn't find it, but of course that doesn't mean it's not out there.
p.s. This is my first time posting on Stack Overflow, would appreciate any constructive criticism on how to ask a better question!
CodePudding user response:
More in line with the tidyverse
"way", I'd use quasiquotation.
make_table <- function(df, group.by) {
df %>%
group_by(across({{group.by}})) %>%
summarize(hwy = mean(hwy), .groups = "drop")
}
make_table(mpg, manufacturer)
make_table(mpg, c(manufacturer, model))
You need across
here because you are passing a vector of (unquoted) column names. Instead of the curly-curly operator, you can also go the enquo
!!
way. The group_by
command then becomes group_by(across(!!enquo(group.by)))
.