I have a dataset with patient data. I want to calculate how many days have passed between a patients first visit and their last to the hospital. I also need a dummy variable (0,1) that takes the value 1 if the observation was the patients first in the dataset (according to the visit date). The dataset looks like:
Patient ID Visit date
1 2014-04-21
1 2015-01-29
1 2021-04-14
2 2020-01-03
2 2021-07-04
. .
. .
What I want:
Patient ID Visit date First visit Difference between first visit and last (in days)
1 2014-04-21 1 0
1 2015-01-29 0 283
1 2021-04-14 0 2550
2 2020-01-03 1 0
2 2021-07-04 0 548
. . . .
. . . .
If I sort the dataset by Patient ID and Visit date and the run the code: if first.PatientID then do; First_visit = 1; end;
Im able to create my dummy variable. I have trouble with calculating the difference in days between first and last visit. Would greatly appreciate any help. Thanks!
CodePudding user response:
You will need to RETAIN a new variable to get the cumulative days. If you use a SUM statement then variable is automatically retained. If you use the DIF() function to get the difference in days between the current and previous date then you can just retain the DAYS variable itself rather than needing to retain the actual first date value.
data have;
input ID DATE :yymmdd.;
format date yymmdd10.;
cards;
1 2014-04-21
1 2015-01-29
1 2021-04-14
2 2020-01-03
2 2021-07-04
;
data want;
set have;
by id date;
first_visit=first.id;
days dif(date);
if first.id then days=0;
run;
A couple of notes.
The FIRST. variable is already coded 1/0 so just use an assignment statement to save its value into a permanent variable.
Do not run the DIF() function conditionally. That will skip adding the current date into the stack to be retrieved the next time. That is why you should add the DIF() to DAYS before forcing DAYS to zero for the first observations for an ID.
Results:
first_
Obs ID DATE visit days
1 1 2014-04-21 1 0
2 1 2015-01-29 0 283
3 1 2021-04-14 0 2550
4 2 2020-01-03 1 0
5 2 2021-07-04 0 548
CodePudding user response:
data WANT;
set HAVE;
by ID;
first_visit = first.ID;
retain firstDate;
if first.ID then firstDate = Date;
days = Date - firstDate;
drop firstDate;
run;