I am fairly new to R. I have a hypothetical dataset containing prescriptions from various different patients and drug types. What I would like to do is to create episodes of drug use, i.e., I would like to see for how long a patient used the drug. The loop mentioned in post sequentially update rows in data.table works for me, but I am not sure how I can make sure that the loop starts over when encountering a new patient identifier or drug type.
These are some rows from the dataset "AllDrugs":
DrugType ID Duration StartPrescr EndPrescr n
1 1 90 5-3-2020 3-6-2020 1
1 2 30 7-1-2020 6-2-2020 1
1 2 30 14-1-2020 12-6-2020 2
1 2 30 21-01-2020 19-6-2020 3
Note: n is a number indicating the prescription by ID and DrugType
This is the current loop:
for (i in 2:nrow(AllDrugs)) {
if (AllDrugs[i,StartPrescr] >= AllDrugs[i-1,EndPrescr]) {
AllDrugs[i, EndPrescr:= StartPrescr Duration]
} else {
AllDrugs[i, EndPrescr:= AllDrugs[i-1,EndPrescr] Duration]
}
}
This is what I get:
DrugType ID Duration StartPrescr EndPrescr n
1 1 90 5-3-2020 3-6-2020 1
1 2 30 7-1-2020 3-7-2020 1
1 2 30 14-1-2020 2-8-2020 2
1 2 30 21-01-2020 1-9-2020 3
This is what I want:
DrugType ID Duration StartPrescr EndPrescr n
1 1 90 5-3-2020 3-6-2020 1
1 2 30 7-1-2020 6-2-2020 1
1 2 30 14-1-2020 7-3-2020 2
1 2 30 21-01-2020 6-4-2020 3
How can I shift the prescriptions based on the duration of the prescription by ID and DrugType? Note: this is an example of one drug type, but DrugType could also be 2, or 3 etc.
CodePudding user response:
Does this work for you?
shift_end <- function(en,dur) {
if(length(en)>1) for(i in 2:length(en)) en[i] = en[i-1] dur[i]
return(en)
}
df[order(ID, DrugType,StartPrescr), EndPrescr:=shift_end(EndPrescr,Duration), by=.(ID,DrugType)]
Result:
DrugType ID Duration StartPrescr EndPrescr n
1: 1 1 90 2020-03-05 2020-06-03 1
2: 1 2 30 2020-01-07 2020-02-06 1
3: 1 2 30 2020-01-14 2020-03-07 2
4: 1 2 30 2020-01-21 2020-04-06 3
Data Source:
df <- structure(list(
DrugType = c(1, 1, 1, 1),
ID = c(1, 2, 2, 2),
Duration = c(90, 30, 30, 30),
StartPrescr = structure(c(18326,18268, 18275, 18282), class = "Date"),
EndPrescr = structure(c(18416, 18298, 18425, 18432), class = "Date"),
n = c(1, 1, 2, 3)), row.names = c(NA,-4L),
class = c("data.table", "data.frame")
)