Home > Software design >  Sorting overlapping categorical variables in {gtsummary}
Sorting overlapping categorical variables in {gtsummary}

Time:12-11

require(gtsummary)

test <- structure(list(`1` = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), `2` = c(1,0, 0, 0, 0, 1, 0, 1, 0, 0), `3` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `4` = c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0), `5` = c(1, 0, 1, 1,0, 1, 1, 0, 0, 0), `6` = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0), `7` = c(0,0, 0, 0, 0, 0, 0, 0, 0, 0), `8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `9` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `10` = c(0, 0, 0,0, 0, 0, 0, 0, 0, 1)), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame")) 

In this example data, I have 10 categorical variables.

     `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0     1     0     1     1     0     0     0     0     0
 2     0     0     0     1     0     0     0     0     0     0
 3     0     0     0     0     1     0     0     0     0     0
 4     0     0     0     0     1     1     0     0     0     0
 5     0     0     0     1     0     0     0     0     0     0
 6     0     1     0     0     1     0     0     0     0     0
 7     0     0     0     0     1     1     0     0     0     0
 8     0     1     0     0     0     0     0     0     0     0
 9     1     0     0     0     0     0     0     0     0     0
10     0     0     0     0     0     0     0     0     0     1

Since they can overlap each other, I have put them in different columns, using 0 and 1, indicatting "yes" or "no" to having (or not having) the categorical variable.

When test %>% tbl_summary(), it creates: enter image description here

I would like to sort this by frequency, but

test %>% tbl_summary(sort = list(everything() ~ "frequency"))

does not work.

Is there anyway to do this? Thank you in advance.

CodePudding user response:

The tbl_summary(sort=) argument sorts levels within a variable, not the order the variables appear in the table. Variables are appear in the table in the same order they appear in the data frame.

We can update the order in the data frame using the code below.

library(gtsummary)
#> #Uighur
packageVersion("gtsummary")
#> [1] '1.5.0'

test <- structure(list(`1` = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), `2` = c(1,0, 0, 0, 0, 1, 0, 1, 0, 0), `3` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `4` = c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0), `5` = c(1, 0, 1, 1,0, 1, 1, 0, 0, 0), `6` = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0), `7` = c(0,0, 0, 0, 0, 0, 0, 0, 0, 0), `8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `9` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `10` = c(0, 0, 0,0, 0, 0, 0, 0, 0, 1)), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame")) 

# order variables by prevelence 
prev <- purrr::map_dbl(test, mean) %>% sort(decreasing = TRUE)

test %>%
  select(all_of(names(prev))) %>%
  tbl_summary() %>%
  as_kable() # convert to kable for SO
Characteristic N = 10
5 5 (50%)
2 3 (30%)
4 3 (30%)
6 2 (20%)
1 1 (10%)
10 1 (10%)
3 0 (0%)
7 0 (0%)
8 0 (0%)
9 0 (0%)

Created on 2021-12-10 by the reprex package (v2.0.1)

  • Related