I am trying compute the mean, median, min, max, and standard deviation for each of the quantitative variables in the dataset as there are some categorical. However, I know na.rm = TRUE has to be used, but an error keeps occurring.
sapply(data, function(x) c("Stand dev" = sd(x, na.rm =TRUE),
"Mean"= mean(x,na.rm=TRUE),
"Median" = median(x, na.rm=TRUE),
"Minimum" = min(x, na.rm =TRUE),
"Maximun" = max(x, na.rm =TRUE)))
The error:
Warning: NAs introduced by coercion Warning: argument is not numeric or logical: returning NA Warning: NAs introduced by coercion Warning: argument is not numeric or logical: returning NA Warning: NAs introduced by coercion Warning: argument is not numeric or logical: returning NA
CodePudding user response:
Please check an example with mtcars
and map_if
as below
df <- do.call(cbind, map_if(mtcars, is.numeric, ~ list(mean(.x), median(.x)))) %>%
as_tibble() %>% unnest(cols = everything())
Created on 2023-02-03 with reprex v2.0.2
# A tibble: 2 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
2 19.2 6 196. 123 3.70 3.32 17.7 0 0 4 2
CodePudding user response:
If you want a solution based on sapply
, you can use this:
sapply(iris[sapply(iris, is.numeric)],
function(x) c("Stand dev" = sd(x, na.rm =TRUE),
"Mean"= mean(x,na.rm=TRUE),
"Median" = median(x, na.rm=TRUE),
"Minimum" = min(x, na.rm =TRUE),
"Maximun" = max(x, na.rm =TRUE)))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Stand dev 0.8280661 0.4358663 1.765298 0.7622377
#> Mean 5.8433333 3.0573333 3.758000 1.1993333
#> Median 5.8000000 3.0000000 4.350000 1.3000000
#> Minimum 4.3000000 2.0000000 1.000000 0.1000000
#> Maximun 7.9000000 4.4000000 6.900000 2.5000000
Created on 2023-02-03 with reprex v2.0.2
However, as noted above, a solution using dplyr
may be more convenient:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris |>
summarise(across(where(is.numeric),
list(`Stand dev` = sd,
Mean = mean,
Median = median,
Minimum = min,
Maximum = max), na.rm=TRUE))
#> Sepal.Length_Stand dev Sepal.Length_Mean Sepal.Length_Median
#> 1 0.8280661 5.843333 5.8
#> Sepal.Length_Minimum Sepal.Length_Maximum Sepal.Width_Stand dev
#> 1 4.3 7.9 0.4358663
#> Sepal.Width_Mean Sepal.Width_Median Sepal.Width_Minimum Sepal.Width_Maximum
#> 1 3.057333 3 2 4.4
#> Petal.Length_Stand dev Petal.Length_Mean Petal.Length_Median
#> 1 1.765298 3.758 4.35
#> Petal.Length_Minimum Petal.Length_Maximum Petal.Width_Stand dev
#> 1 1 6.9 0.7622377
#> Petal.Width_Mean Petal.Width_Median Petal.Width_Minimum Petal.Width_Maximum
#> 1 1.199333 1.3 0.1 2.5
Created on 2023-02-03 with reprex v2.0.2