Working on an RShiny app and am currently having trouble with dplyr
's group_by()
function. I have two defined functions:
gather_info
: finds the category with the highest/lowest mean valuepaste_info
: callsgather_info
and returns the corresponding category and value
The purpose is to return a string that - given a data frame and categorical variable - states the highest- and lowest-performing category and value of said category.
Calling gather_info
with the appropriate arguments works as expected. However, paste_info
consistently returns:
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `grp.col` is not found.
Here's a reproducible example, where the desired output of paste_info
is "Given your data, your best performing group is Cat1 scoring 90% and your worst performing group is Cat2 scoring 20%.":
gather_info <- function(df, grp.col, maxm) {
df |>
mutate_if(
.predicate = function(x) is.character(x),
.funs = function(x) str_to_title(x)
) |>
group_by({{ grp.col }}) |>
summarize(percentage = round(mean(value, na.rm=TRUE) * 100, 2)) |>
arrange(desc(percentage)) %>% # c'est un pipe
{if (maxm) head(., 1) else tail(., 1)}
}
paste_info <- function(df, grp.col) {
high_df <- gather_info(df, grp.col, maxm=TRUE)
low_df <- gather_info(df, grp.col, maxm=FALSE)
paste0("Given your data, your best performing group is ",
high_df |> pull(grp.col), " scoring ", high_df$percentage, "%",
" and your worst performing group is ",
low_df |> pull(grp.col), " scoring ", low_df$percentage, "%.")
}
df <- data.frame(
category=c('cat1', 'cat1', 'cat2', 'cat2', 'cat2', 'cat3', 'cat3'),
value=c(1,0.8,0.2,0.3,0.1,0.5,0.5)
)
# returns category, value with highest mean value
gather_info(df, category, maxm=TRUE)
# returns category, value with lowest mean value
gather_info(df, category, maxm=FALSE)
# does not work
paste_info(df, category)
Any help is much appreciated. Thank you!
CodePudding user response:
The issue is that inside paste_info
you have to use {{
to pass the grouping column grp.col
to gather_info
as well as when you call pull
. This is for the same reason why you have to use {{
in group_by
inside gather_info
In some sense {{
translates e.g. gather_info(df, {{ grp.col }}, maxm = TRUE)
to gather_info(df, category, maxm = TRUE)
, i.e. you pass category
to gather_info
. Without {{
the column name stored in grp.col
will not be "injected" into the expression or function call. Hence, gather_info
will take grp.col
as is and interprets it as the name of the grouping column. But as there I no column with name grp.col
in your data you get an error.
For more info on why {{
is needed see What is data-masking and why do I need {{?.
library(dplyr)
paste_info <- function(df, grp.col) {
high_df <- gather_info(df, {{ grp.col }}, maxm = TRUE)
low_df <- gather_info(df, {{ grp.col }}, maxm = FALSE)
paste0(
"Given your data, your best performing group is ",
high_df |> pull({{ grp.col }}), " scoring ", high_df$percentage, "%",
" and your worst performing group is ",
low_df |> pull({{ grp.col }}), " scoring ", low_df$percentage, "%."
)
}
paste_info(df, category)
#> [1] "Given your data, your best performing group is Cat1 scoring 90% and your worst performing group is Cat2 scoring 20%."