I have a dataframe of patients who underwent one or more surgical procedures and am interested in grouping them by procedure type for analysis of outcomes. The procedures are represented by numbers (1-5). To avoid having to create a new column in the dataframe for each procedure type to identify whether the patient had that unique procedure performed, I'm basically looking for a way to do aggregate grouping and summarizing for each unique value in a list.
A representative df would look like this...
id <- c(1,2,3,4,5,6,7,8,9,10)
procedures <- list(2, 3, c(1,5), 1, c(3,4), c(1,3), 5, 2, c(1,2,5), 4)
df <- as.data.frame(cbind(id, procedures))
Say I wanted to count the number of patients who had each type of procedure. The following would obviously count each unique list as a separate object.
df %>%
group_by(procedures) %>%
summarise(n = n())
What I'm trying to accomplish would be a count of times each unique procedure appears in the list of lists. The below is oversimplified but an example of this.
df %>%
group_by(unique(procedures)) %>%
summarise(n = n())
CodePudding user response:
We may unnest
the list
column and use that in group_by
library(dplyr)
library(tidyr)
df %>%
unnest(everything()) %>%
group_by(procedures) %>%
summarise(n = n())
CodePudding user response:
We could use separate_rows
with count
:
library(dplyr)
library(tidyr)
df %>%
separate_rows("procedures", sep = " ,") %>%
count(procedures)
procedures n
<dbl> <int>
1 1 4
2 2 3
3 3 3
4 4 2
5 5 3