I would like to write a function that take for input a data-frame and one of its factor variable, and that give for output a data-frame with the different levels of this factor and the number of occurence for each level.
Here is a code that do that :
df <- data.frame(ID = sample(c("a", "b", "c", "d"), 20, rep=TRUE))
df %>% group_by(ID) %>% summarise(no_rows = length(ID)) %>% arrange(desc(no_rows))
But I don't know how to put that in a function since the name of the variable (ID) is not inside quote in the second line.
f <- function(df, var){
df %>% group_by(var) %>% summarise(no_rows = length(var)) %>% arrange(desc(no_rows))
}
f(df, ID) do not work. And I can't write f(df, "ID").
CodePudding user response:
Using dplyr::count
and non-standard evaluation (see this SO post or tidyverse documentation) combined with argument sort = TRUE
:
library(dplyr)
f <- function(df, var) df %>% count({{ var }}, name = "no_rows", sort = T)
set.seed(1) # using seed for reproducibility
df <- data.frame(ID = sample(c("a", "b", "c", "d"), 20, rep=TRUE))
f(df, ID)
ID no_rows
1 b 7
2 a 6
3 c 6
4 d 1
CodePudding user response:
f <- function(df,var){
var <- enquo(var)
df %>% group_by(!!var) %>% summarise(no_rows = length(!!var)) %>% arrange(desc(no_rows))
}
update function in this way,
f(df, ID)
output;
ID no_rows
<chr> <int>
1 a 6
2 d 6
3 b 4
4 c 4