I want to create a function that uses (for example) dplyr functions, and in which I can just specify the name of the variable I want to use.
Say for example, one have the following database:
set.seed(20)
df <- data.frame(
country = sample(LETTERS[1:3],17,T))
df
country
1 B
2 C
3 C
4 B
5 A
6 B
7 A
8 B
9 B
10 A
11 C
12 C
13 A
14 C
15 A
16 A
17 A
And I want to create a function which can take the variable name and then perform a simple operation with it:
my_summary <- function(database,variable) {
database %>%
group_by(variable) %>%
summarise(n = n())
}
Nevertheless, when playing this function, is obviously not capturing the variable
> my_summary(df,country)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `variable` is not found.
I have seen several examples but none of them keeps the variable's name in the output. The output should look something like this (with the variable name intact):
# A tibble: 3 × 2
country n
<chr> <int>
1 A 7
2 B 5
3 C 5
CodePudding user response:
Please try the below code, using the rlang {{variable}} within the function
library(rlang)
my_summary <- function(database,variable) {
database %>%
group_by({{variable}}) %>%
summarise(n = n())
}
my_summary(df,country)