Error in a r function based on a dataset information-CodePudding

I have this dataset:

df <- data.frame( raca = c("Nel","Nel","Nel", "Nel","Angus","Angus","Angus","Angus"),
                  marmo = c(350, 320, 330, 400, 800, 820, 450, NA))

and I would like to do the descriptive statistics. I used this code:

df %>%
  group_by(raca) %>%
  dplyr::summarise(across(1,~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
                                                    N = length(.),
                                                    DP = round(sd(.,na.rm=TRUE),digits = 2),
                                                    Min = min(.,na.rm=TRUE),
                                                    Max = max(.,na.rm=TRUE),
                                                    `Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
  pivot_longer(-raca) %>% arrange(name,raca)

and worked well. But I would like to a function, and I tried this code:

desc_function <- function(a,b, c)   { a %>%
    group_by(a[,b]) %>%
    dplyr::summarise(across(a[,c],~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
                                              N = length(.),
                                              DP = round(sd(.,na.rm=TRUE),digits = 2),
                                              Min = min(.,na.rm=TRUE),
                                              Max = max(.,na.rm=TRUE),
                                              `Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
    pivot_longer(a[,b]) %>% arrange(name,a[,b])}


desc_function(df, "raca", "marmo")

But this error happened:

 Error: Problem with summarise() input ..1.
i ..1 = across(...).
x Selections can't have missing values.
i The error occurred in group 1: a[, b] = "Angus".
Run rlang::last_error() to see where the error occurred.

CodePudding user response：

I agree with shafee that reading how to program with dplyr is slightly differently.

Here's how you would do it (adapting your code directly)

desc_function <- function(a,b, c)   { a %>%
    group_by(.data[[b]]) %>%
    dplyr::summarise(across(.data[[c]],~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
                                              N = length(.),
                                              DP = round(sd(.,na.rm=TRUE),digits = 2),
                                              Min = min(.,na.rm=TRUE),
                                              Max = max(.,na.rm=TRUE),
                                              `Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
    pivot_longer(-.data[[b]]) %>% arrange(name,.data[[b]])}


desc_function(df, "raca", "marmo")

Note the use of .data[[b]] to call string variables from the function

Alternatively pass the variables not enclosed in strings like so

desc_function <- function(a,b, c)   { a %>%
    group_by({{b}}) %>%
    dplyr::summarise(across({{c}},~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
                                              N = length(.),
                                              DP = round(sd(.,na.rm=TRUE),digits = 2),
                                              Min = min(.,na.rm=TRUE),
                                              Max = max(.,na.rm=TRUE),
                                              `Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
    pivot_longer(-{{b}}) %>% arrange(name,{{b}})}


desc_function(df, raca, marmo)

This time using {{b}} etc.

All, as mentioned, documented in https://dplyr.tidyverse.org/articles/programming.html

CodePudding user response：

Your first problem, for which you are getting an error, is that the code

function(df, "raca", "marmo")

Is used to define a function, not call a function. You should instead use the defined name of your function, like this:

desc_function(df, "raca", "marmo")

CodePudding user response：

If you want to write a function that contains tidyverse verbs (i.e. function), usual approach of function writing will not work, beacause these tidy functions follows NSE (non standard evaluation) approach. The easiest workaround is to use curly-curly {{}} from rlang package. Simply wrap the function arguments that you will use as columns in dplyr functions with curly-curly. So in your case the function would be,

library(dplyr)
library(tidyr)
library(rlang)

df <- data.frame(
    raca = c("Nel", "Nel", "Nel", "Nel", "Angus", "Angus", "Angus", "Angus"),
    marmo = c(350, 320, 330, 400, 800, 820, 450, NA)
)

df %>%
    group_by(raca) %>%
    dplyr::summarise(across(1, ~ data.frame(
        Média = round(mean(., na.rm = TRUE, digits = 2), digits = 2),
        N = length(.),
        DP = round(sd(., na.rm = TRUE), digits = 2),
        Min = min(., na.rm = TRUE),
        Max = max(., na.rm = TRUE),
        `Coef Variação` = round(sd(., na.rm = TRUE) / mean(., na.rm = TRUE) * 100, digits = 2)
    ))) %>%
    pivot_longer(-raca) %>%
    arrange(name, raca)
#> # A tibble: 2 × 3
#>   raca  name  value$Média    $N   $DP  $Min  $Max $Coef.Variação
#>   <chr> <chr>       <dbl> <int> <dbl> <dbl> <dbl>          <dbl>
#> 1 Angus marmo         690     4 208.    450   820           30.2
#> 2 Nel   marmo         350     4  35.6   320   400           10.2


desc_function <- function(dat, grp_var, var = NULL) {
    df %>%
        group_by({{ grp_var }}) %>%
        dplyr::summarise(across(1, ~ data.frame(
            Média = round(mean(., na.rm = TRUE, digits = 2), digits = 2),
            N = length(.),
            DP = round(sd(., na.rm = TRUE), digits = 2),
            Min = min(., na.rm = TRUE),
            Max = max(., na.rm = TRUE),
            `Coef Variação` = round(sd(., na.rm = TRUE) / mean(., na.rm = TRUE) * 100, digits = 2)
        ))) %>%
        pivot_longer(-{{ grp_var }}) %>%
        arrange(name, {{ grp_var }})
}


desc_function(df, grp_var = raca)

#> # A tibble: 2 × 3
#>   raca  name  value$Média    $N   $DP  $Min  $Max $Coef.Variação
#>   <chr> <chr>       <dbl> <int> <dbl> <dbl> <dbl>          <dbl>
#> 1 Angus marmo         690     4 208.    450   820           30.2
#> 2 Nel   marmo         350     4  35.6   320   400           10.2

^{Created on 2022-07-07 by the reprex package (v2.0.1)}

CodePudding user response：

Here is one way to passing variables as character for dynamic input using !!sym(x)

library(dplyr)
library(tidyr)

df <- data.frame( raca = c("Nel","Nel","Nel", "Nel","Angus","Angus","Angus","Angus"),
                  marmo = c(350, 320, 330, 400, 800, 820, 450, NA))


desc_function <- function(a, b, c)   { 
  a %>%
    group_by(!!sym(b)) %>%
    dplyr::summarise(across(matches(c), ~data.frame(Média =round(mean(.,na.rm=TRUE,digits=2),digits = 2),
                                           N = length(.),
                                           DP = round(sd(.,na.rm=TRUE),digits = 2),
                                           Min = min(.,na.rm=TRUE),
                                           Max = max(.,na.rm=TRUE),
                                           `Coef Variação` = round(sd(., na.rm=TRUE)/mean(.,na.rm=TRUE)*100,digits=2)))) %>%
    pivot_longer(!!sym(c)) %>% arrange(name, !!sym(b))
}

desc_function(df, "raca", "marmo")
#> # A tibble: 2 x 3
#>   raca  name  value$Média    $N   $DP  $Min  $Max $Coef.Variação
#>   <chr> <chr>       <dbl> <int> <dbl> <dbl> <dbl>          <dbl>
#> 1 Angus marmo         690     4 208.    450   820           30.2
#> 2 Nel   marmo         350     4  35.6   320   400           10.2

^{Created on 2022-07-07 by the reprex package (v2.0.1)}

For further information you can reference to this e-book written by Hadley - https://adv-r.hadley.nz/quasiquotation.html