I have a dataset that consists of patient ID’s, an ‘episode’ factor variable that is either “new” or “same”, and a duration variable. The data is grouped by ID and sorted by a date variable (date not in the example dataset), such that the ‘episode’ variable represents either a ‘new’ episode within that particular patient, or it represents the ‘same’ episode as the row above it. I would like to create two different kinds of other ID’s and also a rolling sum. ‘ID_2’ would be an id for the episode within the patient (ie within each ‘ID’). ‘ID_3’ would be a unique id for every episode within the entire dataset. The “sum” variable would be a rolling sum for the total 'duration' for each row within each unique episode.
This code creates the example dataset that I have:
ID <- c(1,2,3,3,3,3,3,3,3,4,4,5,6)
episode <- c("new","new","new","same","new","same","same","new","same","new","same","new","new")
duration <- c(10,13,11,2,45,66,2,4,45,33,23,12,16)
data1 <- cbind(ID,episode,duration)
data1 <- data.frame(data1)
data1$ID <- factor(data1$ID)
data1$episode <- factor(data1$episode)
This code creates what I would like the output to look like:
ID <- c(1,2,3,3,3,3,3,3,3,4,4,5,6)
episode <- c("new","new","new","same","new","same","same","new","same","new","same","new","new")
duration <- c(10,13,11,2,45,66,2,4,45,33,23,12,16)
ID_2 <- c(1,1,1,1,2,2,2,3,3,1,1,1,1)
ID_3 <- c(1,2,3,3,4,4,4,5,5,6,6,7,8)
sum_roll <- c(10,13,11,13,45,111,113,4,49,33,56,12,16)
data2 <- cbind(ID,episode,duration,ID_2,ID_3,sum_roll)
data2 <- data.frame(data2)
data2$ID <- factor(data2$ID)
data2$episode <- factor(data2$episode)
data2$ID_2 <- factor(data2$ID_2)
data2$ID_3 <- factor(data2$ID_3)
I have looked at this link that uses slider for the sum, but I could not figure out how to implement this with the indicator variable episode in combination with the ID. I also looked at using rleid() but this doesn’t seem like a solution since I am trying to create the new ID’s based on information in ID and episode.
I would really appreciate some help! This seems like a very hard problem for me.
CodePudding user response:
Using cumsum
is probably your best option here:
library(tidyverse)
data1 %>%
group_by(ID) %>%
mutate(`ID_2` = cumsum(episode == 'new')) %>%
ungroup() %>%
mutate(`ID_3` = cumsum(episode == 'new')) %>%
group_by(`ID_3`) %>%
mutate(sum_roll = cumsum(duration)) %>%
ungroup()