Is it possible to have gtsummary::tbl_summary
compute the mean excluding outliers? For example in the following code I present sample data of some z-scores. Is it possible to specify what, or add a clause, to how gtsummary::tbl_summary
handles each column?
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})"))
For example, suppose I want the table to report the mean of outcome1
only in cases where outcome1 >= -2.9
and for outcome2
only when cases are outcome2 < 3.0
etc.
Many thanks in advance for any guidance offered.
CodePudding user response:
You can define a new mean function that excludes outlying values. You can define the outlier in any way you'd like. Then pass that function to tbl_summary()
. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
mean_no_extreme <- function(x) {
x <- na.omit(x)
sd <- sd(x)
mean <- mean(x)
# calculate mean excluding extremes
mean(x[x >= mean - sd * 3 & x <= mean sd * 3])
}
dat %>%
select(-c(id)) %>%
tbl_summary(
by=treat,
statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
) %>%
as_kable()
Characteristic | Control, N = 527 | Treat, N = 473 |
---|---|---|
outcome1 | -0.64 (-3.59 to 2.30) | -0.70 (-3.60 to 2.30) |
outcome2 | 0.68 (-1.89 to 3.30) | 0.78 (-1.87 to 3.28) |
outcome3 | 0.20 (-2.47 to 2.80) | 0.23 (-2.48 to 2.80) |
outcome4 | -0.36 (-3.09 to 2.19) | -0.41 (-3.10 to 2.20) |
Created on 2022-03-22 by the reprex package (v2.0.1)