Home > Blockchain >  How can I find the dates that events are happening concurrently in R by group?
How can I find the dates that events are happening concurrently in R by group?

Time:02-24

In R, I need to find which treatments are occurring concurrently and work out what the dose for that day would be. I need to do this by patient, so presumably using a group_by statement in dplyr.

user_id treatment dosage treatment_start treatment_end
1 1 3 01/28/2019 07/30/2019
1 1 2 05/26/2019 11/25/2019
1 2 1 08/13/2019 02/12/2020
1 1 2 12/06/2019 04/07/2020
1 2 1 12/09/2019 06/10/2020

Ideally the final form of it will be the user id, the treatments they're on, the sum of the dosage of all treatments, and the dates that they're on all of those treatments. I've made an example results table with a few rows below.

user_id treatments total_dosage treatment_start treatment_end
1 1 3 01/28/2019 05/25/2019
1 1 5 05/26/2019 07/30/2019
1 1 2 07/31/2019 08/12/2019
1 1,2 3 08/13/2019 11/25/2019

I worked out how to find if an event overlaps with other events but it doesn't get the resulting dates, and doesn't sum the dosages so I don't know if it's usable. In this case, course is a combination of the treatment and dosage column.

DF %>% group_by(user_id ) %>%
   mutate(overlap = purrr::map2_chr(treatment_start, treatment_end, 
              ~toString(course[.x >= treatment_start & .x < treatment_end| .y > treatment_start & .y < treatment_end]))) %>%
  ungroup()

CodePudding user response:

This is an interesting question. One way is to expand the dataframe to have one row for each day, and then summarise the data by date:

library(tidyverse)
library(lubridate)

dat %>% 
  # Convert dates to date format
  mutate(across(treatment_start:treatment_end, ~ mdy(.x))) %>% 
  # Expand the dataframe
  group_by(user_id, treatment_start, treatment_end) %>% 
  mutate(date = list(seq(treatment_start, treatment_end, by = "day"))) %>% 
  unnest(date) %>% 
  # Summarise by day
  group_by(user_id, date) %>% 
  summarise(dosage = sum(dosage),
            treatment = toString(unique(treatment))) %>% 
  # Summarise by different dosage (and create periods)
  group_by(user_id, treatment, dosage) %>% 
  summarise(treatment_start = min(date),
            treatment_ends = max(date)) %>% 
  arrange(treatment_start)

output:

  user_id treatment dosage treatment_start treatment_ends
    <int> <chr>      <int> <date>          <date>        
1       1 1              3 2019-01-28      2019-05-25    
2       1 1              5 2019-05-26      2019-07-30    
3       1 1              2 2019-07-31      2019-08-12    
4       1 1, 2           3 2019-08-13      2020-04-07    
5       1 2              1 2019-11-26      2020-06-10    
6       1 2, 1           3 2019-12-06      2019-12-08    
7       1 2, 1           4 2019-12-09      2020-02-12    
  • Related