How can I find the dates that events are happening concurrently in R by group?-CodePudding

In R, I need to find which treatments are occurring concurrently and work out what the dose for that day would be. I need to do this by patient, so presumably using a group_by statement in dplyr.

user_id	treatment	dosage	treatment_start	treatment_end
1	1	3	01/28/2019	07/30/2019
1	1	2	05/26/2019	11/25/2019
1	2	1	08/13/2019	02/12/2020
1	1	2	12/06/2019	04/07/2020
1	2	1	12/09/2019	06/10/2020

Ideally the final form of it will be the user id, the treatments they're on, the sum of the dosage of all treatments, and the dates that they're on all of those treatments. I've made an example results table with a few rows below.

user_id	treatments	total_dosage	treatment_start	treatment_end
1	1	3	01/28/2019	05/25/2019
1	1	5	05/26/2019	07/30/2019
1	1	2	07/31/2019	08/12/2019
1	1,2	3	08/13/2019	11/25/2019

I worked out how to find if an event overlaps with other events but it doesn't get the resulting dates, and doesn't sum the dosages so I don't know if it's usable. In this case, course is a combination of the treatment and dosage column.

DF %>% group_by(user_id ) %>%
   mutate(overlap = purrr::map2_chr(treatment_start, treatment_end, 
              ~toString(course[.x >= treatment_start & .x < treatment_end| .y > treatment_start & .y < treatment_end]))) %>%
  ungroup()

CodePudding user response：

This is an interesting question. One way is to expand the dataframe to have one row for each day, and then summarise the data by date:

library(tidyverse)
library(lubridate)

dat %>% 
  # Convert dates to date format
  mutate(across(treatment_start:treatment_end, ~ mdy(.x))) %>% 
  # Expand the dataframe
  group_by(user_id, treatment_start, treatment_end) %>% 
  mutate(date = list(seq(treatment_start, treatment_end, by = "day"))) %>% 
  unnest(date) %>% 
  # Summarise by day
  group_by(user_id, date) %>% 
  summarise(dosage = sum(dosage),
            treatment = toString(unique(treatment))) %>% 
  # Summarise by different dosage (and create periods)
  group_by(user_id, treatment, dosage) %>% 
  summarise(treatment_start = min(date),
            treatment_ends = max(date)) %>% 
  arrange(treatment_start)

output:

  user_id treatment dosage treatment_start treatment_ends
    <int> <chr>      <int> <date>          <date>        
1       1 1              3 2019-01-28      2019-05-25    
2       1 1              5 2019-05-26      2019-07-30    
3       1 1              2 2019-07-31      2019-08-12    
4       1 1, 2           3 2019-08-13      2020-04-07    
5       1 2              1 2019-11-26      2020-06-10    
6       1 2, 1           3 2019-12-06      2019-12-08    
7       1 2, 1           4 2019-12-09      2020-02-12