Home > Back-end >  R/gtsummary: excluding outliers in tbl_summary
R/gtsummary: excluding outliers in tbl_summary

Time:03-23

Is it possible to have gtsummary::tbl_summary compute the mean excluding outliers? For example in the following code I present sample data of some z-scores. Is it possible to specify what, or add a clause, to how gtsummary::tbl_summary handles each column?

set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
                  treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
                  outcome1=runif(n, min=-3.6, max=2.3),
                  outcome2=runif(n, min=-1.9, max=3.3),
                  outcome3=runif(n, min=-2.5, max=2.8),
                  outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})")) 

For example, suppose I want the table to report the mean of outcome1only in cases where outcome1 >= -2.9 and for outcome2 only when cases are outcome2 < 3.0 etc.

Many thanks in advance for any guidance offered.

CodePudding user response:

You can define a new mean function that excludes outlying values. You can define the outlier in any way you'd like. Then pass that function to tbl_summary(). Example below!

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'

set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
                  treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
                  outcome1=runif(n, min=-3.6, max=2.3),
                  outcome2=runif(n, min=-1.9, max=3.3),
                  outcome3=runif(n, min=-2.5, max=2.8),
                  outcome4=runif(n, min=-3.1, max=2.2))

mean_no_extreme <- function(x) {
  x <- na.omit(x)
  sd <- sd(x)
  mean <- mean(x)
  
  # calculate mean excluding extremes
  mean(x[x >= mean - sd * 3 & x <= mean   sd * 3])
}


dat %>% 
  select(-c(id)) %>% 
  tbl_summary(
    by=treat, 
    statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
  ) %>%
  as_kable()
Characteristic Control, N = 527 Treat, N = 473
outcome1 -0.64 (-3.59 to 2.30) -0.70 (-3.60 to 2.30)
outcome2 0.68 (-1.89 to 3.30) 0.78 (-1.87 to 3.28)
outcome3 0.20 (-2.47 to 2.80) 0.23 (-2.48 to 2.80)
outcome4 -0.36 (-3.09 to 2.19) -0.41 (-3.10 to 2.20)

Created on 2022-03-22 by the reprex package (v2.0.1)

  • Related