Sorting overlapping categorical variables in {gtsummary}-CodePudding

require(gtsummary)

test <- structure(list(`1` = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), `2` = c(1,0, 0, 0, 0, 1, 0, 1, 0, 0), `3` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `4` = c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0), `5` = c(1, 0, 1, 1,0, 1, 1, 0, 0, 0), `6` = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0), `7` = c(0,0, 0, 0, 0, 0, 0, 0, 0, 0), `8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `9` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `10` = c(0, 0, 0,0, 0, 0, 0, 0, 0, 1)), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame"))

In this example data, I have 10 categorical variables.

     `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0     1     0     1     1     0     0     0     0     0
 2     0     0     0     1     0     0     0     0     0     0
 3     0     0     0     0     1     0     0     0     0     0
 4     0     0     0     0     1     1     0     0     0     0
 5     0     0     0     1     0     0     0     0     0     0
 6     0     1     0     0     1     0     0     0     0     0
 7     0     0     0     0     1     1     0     0     0     0
 8     0     1     0     0     0     0     0     0     0     0
 9     1     0     0     0     0     0     0     0     0     0
10     0     0     0     0     0     0     0     0     0     1

Since they can overlap each other, I have put them in different columns, using 0 and 1, indicatting "yes" or "no" to having (or not having) the categorical variable.

When test %>% tbl_summary(), it creates:

I would like to sort this by frequency, but

test %>% tbl_summary(sort = list(everything() ~ "frequency"))

does not work.

Is there anyway to do this? Thank you in advance.

CodePudding user response：

The tbl_summary(sort=) argument sorts levels within a variable, not the order the variables appear in the table. Variables are appear in the table in the same order they appear in the data frame.

We can update the order in the data frame using the code below.

library(gtsummary)
#> #Uighur
packageVersion("gtsummary")
#> [1] '1.5.0'

test <- structure(list(`1` = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), `2` = c(1,0, 0, 0, 0, 1, 0, 1, 0, 0), `3` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `4` = c(1, 1, 0, 0, 1, 0, 0, 0, 0, 0), `5` = c(1, 0, 1, 1,0, 1, 1, 0, 0, 0), `6` = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0), `7` = c(0,0, 0, 0, 0, 0, 0, 0, 0, 0), `8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,0), `9` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `10` = c(0, 0, 0,0, 0, 0, 0, 0, 0, 1)), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame")) 

# order variables by prevelence 
prev <- purrr::map_dbl(test, mean) %>% sort(decreasing = TRUE)

test %>%
  select(all_of(names(prev))) %>%
  tbl_summary() %>%
  as_kable() # convert to kable for SO

Characteristic	N = 10
5	5 (50%)
2	3 (30%)
4	3 (30%)
6	2 (20%)
1	1 (10%)
10	1 (10%)
3	0 (0%)
7	0 (0%)
8	0 (0%)
9	0 (0%)

^{Created on 2021-12-10 by the reprex package (v2.0.1)}