In R, I need to find which treatments are occurring concurrently and work out what the dose for that day would be. I need to do this by patient, so presumably using a group_by statement in dplyr
.
user_id | treatment | dosage | treatment_start | treatment_end |
---|---|---|---|---|
1 | 1 | 3 | 01/28/2019 | 07/30/2019 |
1 | 1 | 2 | 05/26/2019 | 11/25/2019 |
1 | 2 | 1 | 08/13/2019 | 02/12/2020 |
1 | 1 | 2 | 12/06/2019 | 04/07/2020 |
1 | 2 | 1 | 12/09/2019 | 06/10/2020 |
Ideally the final form of it will be the user id, the treatments they're on, the sum of the dosage of all treatments, and the dates that they're on all of those treatments. I've made an example results table with a few rows below.
user_id | treatments | total_dosage | treatment_start | treatment_end |
---|---|---|---|---|
1 | 1 | 3 | 01/28/2019 | 05/25/2019 |
1 | 1 | 5 | 05/26/2019 | 07/30/2019 |
1 | 1 | 2 | 07/31/2019 | 08/12/2019 |
1 | 1,2 | 3 | 08/13/2019 | 11/25/2019 |
I worked out how to find if an event overlaps with other events but it doesn't get the resulting dates, and doesn't sum the dosages so I don't know if it's usable. In this case, course is a combination of the treatment and dosage column.
DF %>% group_by(user_id ) %>%
mutate(overlap = purrr::map2_chr(treatment_start, treatment_end,
~toString(course[.x >= treatment_start & .x < treatment_end| .y > treatment_start & .y < treatment_end]))) %>%
ungroup()
CodePudding user response:
This is an interesting question. One way is to expand the dataframe to have one row for each day, and then summarise the data by date:
library(tidyverse)
library(lubridate)
dat %>%
# Convert dates to date format
mutate(across(treatment_start:treatment_end, ~ mdy(.x))) %>%
# Expand the dataframe
group_by(user_id, treatment_start, treatment_end) %>%
mutate(date = list(seq(treatment_start, treatment_end, by = "day"))) %>%
unnest(date) %>%
# Summarise by day
group_by(user_id, date) %>%
summarise(dosage = sum(dosage),
treatment = toString(unique(treatment))) %>%
# Summarise by different dosage (and create periods)
group_by(user_id, treatment, dosage) %>%
summarise(treatment_start = min(date),
treatment_ends = max(date)) %>%
arrange(treatment_start)
output:
user_id treatment dosage treatment_start treatment_ends
<int> <chr> <int> <date> <date>
1 1 1 3 2019-01-28 2019-05-25
2 1 1 5 2019-05-26 2019-07-30
3 1 1 2 2019-07-31 2019-08-12
4 1 1, 2 3 2019-08-13 2020-04-07
5 1 2 1 2019-11-26 2020-06-10
6 1 2, 1 3 2019-12-06 2019-12-08
7 1 2, 1 4 2019-12-09 2020-02-12