Home > Software engineering >  Consolidate a list of members enrolled each month to a range of start and stop dates per member in R
Consolidate a list of members enrolled each month to a range of start and stop dates per member in R

Time:03-26

Let's say I have a very tall data frame of all the months each doctor is enrolled:

doctor month_enrolled
Dr. Brown March, 2021
Dr. Brown April, 2021
Dr. Brown July, 2021
Dr. Brown August, 2021
Dr. Green April, 2021
Dr. Green May, 2021
Dr. Black Jan, 2021
Dr. Black May, 2021
Dr. Black June, 2021
Dr. Black July, 2021
Dr. Pink September, 2021
... ...

And I want to consolidate this into a smaller data frame with ranges of enrollment_start and enrollment_end per doctor like this:

doctor enrollment_start enrollment_end
Dr. Brown March, 2021 April, 2021
Dr. Brown July, 2021 August, 2021
Dr. Green April, 2021 May, 2021
Dr. Black Jan, 2021 Jan, 2021
DR. Black May, 2021 July, 2021
Dr. Pink September, 2021 September, 2021
... ... ...

Whereby non-continuous enrollment periods are reflected as separate rows.

How might I accomplish this in R, preferably using dplyr / tidyverse syntax?

CodePudding user response:

library(dplyr)
library(lubridate)

data.frame(
  stringsAsFactors = FALSE,
            doctor = c("Dr. Brown","Dr. Brown",
                       "Dr. Brown","Dr. Brown","Dr. Green","Dr. Green",
                       "Dr. Black","Dr. Black","Dr. Black","Dr. Black","Dr. Pink"),
    month_enrolled = c("March, 2021","April, 2021",
                       "July, 2021","August, 2021","April, 2021","May, 2021",
                       "Jan, 2021","May, 2021","June, 2021","July, 2021",
                       "September, 2021")
) %>%
  
transmute(doctor, month = dmy(paste(1,month_enrolled))) %>%
group_by(doctor) %>%
mutate(stint = 1   cumsum(c(0, diff(month) > 31))) %>% # EDIT - shorter
group_by(doctor, stint) %>%
summarize(enrollment_start = min(month),
            enrollment_end = max(month), .groups = "drop")

Result

# A tibble: 6 × 4
  doctor    stint enrollment_start enrollment_end
  <chr>     <dbl> <date>           <date>        
1 Dr. Black     1 2021-01-01       2021-01-01    
2 Dr. Black     2 2021-05-01       2021-07-01    
3 Dr. Brown     1 2021-03-01       2021-04-01    
4 Dr. Brown     2 2021-07-01       2021-08-01    
5 Dr. Green     1 2021-04-01       2021-05-01    
6 Dr. Pink      1 2021-09-01       2021-09-01 
  • Related