Home > Software engineering >  Sequentially update rows by group using data.table
Sequentially update rows by group using data.table

Time:09-17

I am fairly new to R. I have a hypothetical dataset containing prescriptions from various different patients and drug types. What I would like to do is to create episodes of drug use, i.e., I would like to see for how long a patient used the drug. The loop mentioned in post sequentially update rows in data.table works for me, but I am not sure how I can make sure that the loop starts over when encountering a new patient identifier or drug type.

These are some rows from the dataset "AllDrugs":

DrugType    ID  Duration StartPrescr    EndPrescr   n
1           1   90       5-3-2020       3-6-2020    1
1           2   30       7-1-2020       6-2-2020    1
1           2   30       14-1-2020      12-6-2020   2
1           2   30       21-01-2020     19-6-2020   3

Note: n is a number indicating the prescription by ID and DrugType

This is the current loop:

 for (i in 2:nrow(AllDrugs)) {
   if (AllDrugs[i,StartPrescr] >= AllDrugs[i-1,EndPrescr]) {
     AllDrugs[i, EndPrescr:= StartPrescr  Duration]
   } else {
     AllDrugs[i, EndPrescr:= AllDrugs[i-1,EndPrescr]   Duration]
   }
 }

This is what I get:

DrugType    ID  Duration    StartPrescr EndPrescr   n
1           1   90          5-3-2020    3-6-2020    1
1           2   30          7-1-2020    3-7-2020    1
1           2   30          14-1-2020   2-8-2020    2
1           2   30          21-01-2020  1-9-2020    3

This is what I want:

DrugType    ID  Duration    StartPrescr EndPrescr   n
1           1   90          5-3-2020    3-6-2020    1
1           2   30          7-1-2020    6-2-2020    1
1           2   30          14-1-2020   7-3-2020    2
1           2   30          21-01-2020  6-4-2020    3

How can I shift the prescriptions based on the duration of the prescription by ID and DrugType? Note: this is an example of one drug type, but DrugType could also be 2, or 3 etc.

CodePudding user response:

Does this work for you?

shift_end <- function(en,dur) {
  if(length(en)>1) for(i in 2:length(en)) en[i] = en[i-1]   dur[i]
  return(en)
}

df[order(ID, DrugType,StartPrescr), EndPrescr:=shift_end(EndPrescr,Duration), by=.(ID,DrugType)]

Result:

   DrugType ID Duration StartPrescr  EndPrescr n
1:        1  1       90  2020-03-05 2020-06-03 1
2:        1  2       30  2020-01-07 2020-02-06 1
3:        1  2       30  2020-01-14 2020-03-07 2
4:        1  2       30  2020-01-21 2020-04-06 3

Data Source:

df <- structure(list(
  DrugType = c(1, 1, 1, 1),
  ID = c(1, 2, 2, 2),
  Duration = c(90, 30, 30, 30),
  StartPrescr = structure(c(18326,18268, 18275, 18282), class = "Date"),
  EndPrescr = structure(c(18416, 18298, 18425, 18432), class = "Date"),
  n = c(1, 1, 2, 3)), row.names = c(NA,-4L),
  class = c("data.table", "data.frame")
)
  • Related