Home > Blockchain >  How to create a function with database variables keeping its names in R
How to create a function with database variables keeping its names in R

Time:01-15

I want to create a function that uses (for example) dplyr functions, and in which I can just specify the name of the variable I want to use.

Say for example, one have the following database:

set.seed(20)    
df <- data.frame(
      country = sample(LETTERS[1:3],17,T))

    df
     country
1        B
2        C
3        C
4        B
5        A
6        B
7        A
8        B
9        B
10       A
11       C
12       C
13       A
14       C
15       A
16       A
17       A

And I want to create a function which can take the variable name and then perform a simple operation with it:

my_summary <- function(database,variable) {
  database %>% 
    group_by(variable) %>% 
    summarise(n = n())
}

Nevertheless, when playing this function, is obviously not capturing the variable

> my_summary(df,country)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `variable` is not found.

I have seen several examples but none of them keeps the variable's name in the output. The output should look something like this (with the variable name intact):

# A tibble: 3 × 2
  country     n
  <chr>   <int>
1 A           7
2 B           5
3 C           5

CodePudding user response:

Please try the below code, using the rlang {{variable}} within the function

library(rlang)

my_summary <- function(database,variable) {
  database %>% 
    group_by({{variable}}) %>% 
    summarise(n = n())
}


my_summary(df,country)
  • Related