Home > Back-end >  tbl_summary() with long, tidy data where single (e.g.,) patient ID has multiple values in factor col
tbl_summary() with long, tidy data where single (e.g.,) patient ID has multiple values in factor col

Time:11-13

    library(tidyverse)
    library(gtsummary)

    tibble(patient_id = c("A","A", "A", "B", "B"), 
           disease = c("cancer", "heart disease", "fat fingers", "heart disease", "fat fingers")) %>% 
      select(-patient_id) %>% 
      tbl_summary()

enter image description here

The above is example code and output from the tbl_summary function from the gtsummary library.

I've got two patients in this case (A and B) and would like to see the output showing what percentage of the patients have "fat fingers" (should be 100%), what percentage have "heart disease" (should be 100%) and what percentage have cancer (should be 50%), and I'd also like the "N" to be equal to 2.

Within tbl_summary() I can't see that there's an option to summarise by an id column, for example, so the only other option I have would be to pivot_wider(id_cols = patient_id, values_from....etc.), but then the tbl_summary() output wouldn't have the nice headings and sub-headings. Other than that, I guess I can just make a custom gt table from scratch, but I like the ease of tbl_summary(), it just doesn't appear to work well when there's more than one factor value per id.

CodePudding user response:

A typical gtsummary table does expect the data in wide format. But you can get the table you need. You'll start by adding rows for patients without the listed disease, and then you can summarize with tbl_summary(). You'll need to make a few aesthetic modifications as well. Example below!

library(tidyverse)
library(gtsummary)

tbl <- 
  tibble(patient_id = c("A","A", "A", "B", "B"), 
       disease = c("cancer", "heart disease", "fat fingers", 
                   "heart disease", "fat fingers"), 
       present = TRUE) %>%
  # adding observations for patients without a disease
  complete(patient_id, disease, fill = list(present = FALSE)) %>%
  select(-patient_id) %>% 
  # summarizing data
  tbl_summary(by = present, percent = "row") %>%
  modify_header(stat_2 ~ "**Overall**") %>%
  modify_column_hide(stat_1)

enter image description here Created on 2021-11-12 by the reprex package (v2.0.1)

  • Related