Home > other >  How do i use the sapply() function on only quantitative variables in dataset?
How do i use the sapply() function on only quantitative variables in dataset?

Time:02-04

I am trying compute the mean, median, min, max, and standard deviation for each of the quantitative variables in the dataset as there are some categorical. However, I know na.rm = TRUE has to be used, but an error keeps occurring.

sapply(data, function(x) c("Stand dev" = sd(x, na.rm =TRUE), 
                         "Mean"= mean(x,na.rm=TRUE),
                         "Median" = median(x, na.rm=TRUE),
                         "Minimum" = min(x, na.rm =TRUE),
                         "Maximun" = max(x, na.rm =TRUE)))

The error:

Warning: NAs introduced by coercion
Warning: argument is not numeric or logical: returning NA
Warning: NAs introduced by coercion
Warning: argument is not numeric or logical: returning NA
Warning: NAs introduced by coercion
Warning: argument is not numeric or logical: returning NA

CodePudding user response:

Please check an example with mtcars and map_if as below

df <- do.call(cbind, map_if(mtcars, is.numeric, ~ list(mean(.x), median(.x)))) %>% 
as_tibble() %>% unnest(cols = everything())

Created on 2023-02-03 with reprex v2.0.2

# A tibble: 2 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  20.1  6.19  231.  147.  3.60  3.22  17.8 0.438 0.406  3.69  2.81
2  19.2  6     196.  123   3.70  3.32  17.7 0     0      4     2   

CodePudding user response:

If you want a solution based on sapply, you can use this:

sapply(iris[sapply(iris, is.numeric)], 
       function(x) c("Stand dev" = sd(x, na.rm =TRUE), 
                       "Mean"= mean(x,na.rm=TRUE),
                       "Median" = median(x, na.rm=TRUE),
                       "Minimum" = min(x, na.rm =TRUE),
                       "Maximun" = max(x, na.rm =TRUE)))
#>           Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Stand dev    0.8280661   0.4358663     1.765298   0.7622377
#> Mean         5.8433333   3.0573333     3.758000   1.1993333
#> Median       5.8000000   3.0000000     4.350000   1.3000000
#> Minimum      4.3000000   2.0000000     1.000000   0.1000000
#> Maximun      7.9000000   4.4000000     6.900000   2.5000000

Created on 2023-02-03 with reprex v2.0.2

However, as noted above, a solution using dplyr may be more convenient:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris |> 
  summarise(across(where(is.numeric), 
                   list(`Stand dev` = sd,
                        Mean = mean,
                        Median = median,
                        Minimum = min,
                        Maximum = max), na.rm=TRUE))
#>   Sepal.Length_Stand dev Sepal.Length_Mean Sepal.Length_Median
#> 1              0.8280661          5.843333                 5.8
#>   Sepal.Length_Minimum Sepal.Length_Maximum Sepal.Width_Stand dev
#> 1                  4.3                  7.9             0.4358663
#>   Sepal.Width_Mean Sepal.Width_Median Sepal.Width_Minimum Sepal.Width_Maximum
#> 1         3.057333                  3                   2                 4.4
#>   Petal.Length_Stand dev Petal.Length_Mean Petal.Length_Median
#> 1               1.765298             3.758                4.35
#>   Petal.Length_Minimum Petal.Length_Maximum Petal.Width_Stand dev
#> 1                    1                  6.9             0.7622377
#>   Petal.Width_Mean Petal.Width_Median Petal.Width_Minimum Petal.Width_Maximum
#> 1         1.199333                1.3                 0.1                 2.5

Created on 2023-02-03 with reprex v2.0.2

  • Related