I am in the process of converting to data.table and so far have not been able to find a data.table way to create a table with summary statistics based on a self-defined function. Until now, I have used dplyr to accomplish this, for which I provide the code below. Is it possible to achieve a similar thing in a neat way using data.table?
library(dplyr)
library(mlbench)
data(BostonHousing)
df <- BostonHousing
fun_stats <- function(x) {
min <- min(x, na.rm = TRUE)
max <- max(x, na.rm = TRUE)
mean <- mean(x, na.rm = TRUE)
summary <- list(min = min, max = max, mean = mean)
}
stats <- df %>%
select_if(is.numeric) %>%
purrr::map(fun_stats) %>%
bind_rows(., .id = "var") %>%
mutate(across(where(is.numeric)))
CodePudding user response:
library(data.table)
library(mlbench)
data(BostonHousing)
dt <- as.data.table(BostonHousing)
fun_stats <- function(x) {
min <- min(x, na.rm = TRUE)
max <- max(x, na.rm = TRUE)
mean <- mean(x, na.rm = TRUE)
summary <- list(min = min, max = max, mean = mean)
}
dt[, rbindlist(lapply(.SD, fun_stats), idcol = "var"),
.SDcols = is.numeric]
#> var min max mean
#> <char> <num> <num> <num>
#> 1: crim 0.00632 88.9762 3.6135236
#> 2: zn 0.00000 100.0000 11.3636364
#> 3: indus 0.46000 27.7400 11.1367787
#> 4: nox 0.38500 0.8710 0.5546951
#> 5: rm 3.56100 8.7800 6.2846344
#> 6: age 2.90000 100.0000 68.5749012
#> 7: dis 1.12960 12.1265 3.7950427
#> 8: rad 1.00000 24.0000 9.5494071
#> 9: tax 187.00000 711.0000 408.2371542
#> 10: ptratio 12.60000 22.0000 18.4555336
#> 11: b 0.32000 396.9000 356.6740316
#> 12: lstat 1.73000 37.9700 12.6530632
#> 13: medv 5.00000 50.0000 22.5328063
Created on 2022-06-24 by the reprex package (v2.0.1)