I have this dataset:
df <- data.frame( raca = c("Nel","Nel","Nel", "Nel","Angus","Angus","Angus","Angus"),
marmo = c(350, 320, 330, 400, 800, 820, 450, NA))
and I would like to do the descriptive statistics. I used this code:
df %>%
group_by(raca) %>%
dplyr::summarise(across(1,~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
N = length(.),
DP = round(sd(.,na.rm=TRUE),digits = 2),
Min = min(.,na.rm=TRUE),
Max = max(.,na.rm=TRUE),
`Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
pivot_longer(-raca) %>% arrange(name,raca)
and worked well. But I would like to a function, and I tried this code:
desc_function <- function(a,b, c) { a %>%
group_by(a[,b]) %>%
dplyr::summarise(across(a[,c],~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
N = length(.),
DP = round(sd(.,na.rm=TRUE),digits = 2),
Min = min(.,na.rm=TRUE),
Max = max(.,na.rm=TRUE),
`Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
pivot_longer(a[,b]) %>% arrange(name,a[,b])}
desc_function(df, "raca", "marmo")
But this error happened:
Error: Problem with summarise() input ..1.
i ..1 = across(...).
x Selections can't have missing values.
i The error occurred in group 1: a[, b] = "Angus".
Run rlang::last_error() to see where the error occurred.
CodePudding user response:
I agree with shafee that reading how to program with dplyr
is slightly differently.
Here's how you would do it (adapting your code directly)
desc_function <- function(a,b, c) { a %>%
group_by(.data[[b]]) %>%
dplyr::summarise(across(.data[[c]],~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
N = length(.),
DP = round(sd(.,na.rm=TRUE),digits = 2),
Min = min(.,na.rm=TRUE),
Max = max(.,na.rm=TRUE),
`Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
pivot_longer(-.data[[b]]) %>% arrange(name,.data[[b]])}
desc_function(df, "raca", "marmo")
Note the use of .data[[b]]
to call string variables from the function
Alternatively pass the variables not enclosed in strings like so
desc_function <- function(a,b, c) { a %>%
group_by({{b}}) %>%
dplyr::summarise(across({{c}},~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
N = length(.),
DP = round(sd(.,na.rm=TRUE),digits = 2),
Min = min(.,na.rm=TRUE),
Max = max(.,na.rm=TRUE),
`Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
pivot_longer(-{{b}}) %>% arrange(name,{{b}})}
desc_function(df, raca, marmo)
This time using {{b}}
etc.
All, as mentioned, documented in https://dplyr.tidyverse.org/articles/programming.html
CodePudding user response:
Your first problem, for which you are getting an error, is that the code
function(df, "raca", "marmo")
Is used to define a function, not call a function. You should instead use the defined name of your function, like this:
desc_function(df, "raca", "marmo")
CodePudding user response:
If you want to write a function that contains tidyverse
verbs (i.e. function), usual approach of function writing will not work, beacause these tidy functions follows NSE
(non standard evaluation) approach. The easiest workaround is to use curly-curly {{}}
from rlang package. Simply wrap the function arguments that you will use as columns in dplyr functions with curly-curly. So in your case the function would be,
library(dplyr)
library(tidyr)
library(rlang)
df <- data.frame(
raca = c("Nel", "Nel", "Nel", "Nel", "Angus", "Angus", "Angus", "Angus"),
marmo = c(350, 320, 330, 400, 800, 820, 450, NA)
)
df %>%
group_by(raca) %>%
dplyr::summarise(across(1, ~ data.frame(
Média = round(mean(., na.rm = TRUE, digits = 2), digits = 2),
N = length(.),
DP = round(sd(., na.rm = TRUE), digits = 2),
Min = min(., na.rm = TRUE),
Max = max(., na.rm = TRUE),
`Coef Variação` = round(sd(., na.rm = TRUE) / mean(., na.rm = TRUE) * 100, digits = 2)
))) %>%
pivot_longer(-raca) %>%
arrange(name, raca)
#> # A tibble: 2 × 3
#> raca name value$Média $N $DP $Min $Max $Coef.Variação
#> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Angus marmo 690 4 208. 450 820 30.2
#> 2 Nel marmo 350 4 35.6 320 400 10.2
desc_function <- function(dat, grp_var, var = NULL) {
df %>%
group_by({{ grp_var }}) %>%
dplyr::summarise(across(1, ~ data.frame(
Média = round(mean(., na.rm = TRUE, digits = 2), digits = 2),
N = length(.),
DP = round(sd(., na.rm = TRUE), digits = 2),
Min = min(., na.rm = TRUE),
Max = max(., na.rm = TRUE),
`Coef Variação` = round(sd(., na.rm = TRUE) / mean(., na.rm = TRUE) * 100, digits = 2)
))) %>%
pivot_longer(-{{ grp_var }}) %>%
arrange(name, {{ grp_var }})
}
desc_function(df, grp_var = raca)
#> # A tibble: 2 × 3
#> raca name value$Média $N $DP $Min $Max $Coef.Variação
#> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Angus marmo 690 4 208. 450 820 30.2
#> 2 Nel marmo 350 4 35.6 320 400 10.2
Created on 2022-07-07 by the reprex package (v2.0.1)
CodePudding user response:
Here is one way to passing variables as character for dynamic input using !!sym(x)
library(dplyr)
library(tidyr)
df <- data.frame( raca = c("Nel","Nel","Nel", "Nel","Angus","Angus","Angus","Angus"),
marmo = c(350, 320, 330, 400, 800, 820, 450, NA))
desc_function <- function(a, b, c) {
a %>%
group_by(!!sym(b)) %>%
dplyr::summarise(across(matches(c), ~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
N = length(.),
DP = round(sd(.,na.rm=TRUE),digits = 2),
Min = min(.,na.rm=TRUE),
Max = max(.,na.rm=TRUE),
`Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
pivot_longer(!!sym(c)) %>% arrange(name, !!sym(b))
}
desc_function(df, "raca", "marmo")
#> # A tibble: 2 x 3
#> raca name value$Média $N $DP $Min $Max $Coef.Variação
#> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Angus marmo 690 4 208. 450 820 30.2
#> 2 Nel marmo 350 4 35.6 320 400 10.2
Created on 2022-07-07 by the reprex package (v2.0.1)
For further information you can reference to this e-book written by Hadley - https://adv-r.hadley.nz/quasiquotation.html