Home > Enterprise >  Taking multiple rows and combining into one column with missing data
Taking multiple rows and combining into one column with missing data

Time:08-14

I have a dataframe in R with patients presenting with certain complications with each complication being a different row. Sample dataframe below:

id <- c(1,1,2,2,2,3)
complication <- c("heart failure","myocardial infarction","pneumonia","UTI","cellulitis","pneumonia")
data <- data.frame(id, complication)

I want to reframe it so that it shows the patient id and then each complication they present with. However, not all patients present with the same number of complications, so the other values should be counted as NA. This is what I want it to look like:

id2 <- c(1,2,3)
complication1 <- c("heart failure","pneumonia","pneumonia")
complication2 <- c("myocardial infarction","UTI", NA)
complication3 <- c("cellulitis",NA,NA)
data2 <- data.frame(id2, complication1, complication2, complication3 )

I am also not sure how to rename the columns, since I will not be able to do this manually with the size of the dataset. Would appreciate any help - thanks!

CodePudding user response:

One option is to create the column name prior to putting it into the wide format with pivot_wider.

library(tidyverse)

data %>% 
  group_by(id) %>% 
  mutate(col_name = paste0("complication", row_number())) %>% 
  pivot_wider(names_from = "col_name", values_from = "complication")

Output

     id complication1 complication2         complication3
  <dbl> <chr>         <chr>                 <chr>        
1     1 heart failure myocardial infarction NA           
2     2 pneumonia     UTI                   cellulitis   
3     3 pneumonia     NA                    NA           
  • Related