data test;
Input ID $1. Diagnosed_date :ddmmyy10. medication202001 medication202002
medication202003 medication202004;
Format diagnosed_date ddmmyy10.;
Datalines;
A 28/02/2020 0 1 1 1
B 28/02/2020 1 1 1 1
C 30/04/2020 0 0 0 1
:
Run;
Hi, I’m relatively new to SAS. The above is an example of a clinical trial. I am trying to create an additional variable that shows the medication status 1 month prior to their diagnosed_date. So for example, for ID = A, I want to pick the medication status as medication202001, whereas for ID = C, the medication status as medication202003.
Note the variable’s naming convention has the YYYYMM as the suffix.
A friend of mine has suggested this can be done using SAS array, but I have no idea how to get this created. I’ve tried but didn’t get far at all.
Would really appreciate any feedback here.
CodePudding user response:
We can loop through an array and read the month part of each variable name using the vname()
function. We'll remove all alphabetic characters from the variable's name, leaving only the month part. We can convert this to a SAS date.
Since we know when the previous month is, we simply need to compare the two together and pull that variable's value for each row. When SAS reads 202001
, it will automatically align it to the first of the month (e.g. 01JAN2020). We'll do this to our previous diagnosed_date
month.
diagnosed_date last_month month
28FEB2020 JAN2020 JAN2020 <---- last_month = month. Stop and save the value.
30APR2020 MAR2020 JAN2020
30APR2020 MAR2020 FEB2020
30APR2020 MAR2020 MAR2020 <---- last_month = month. Stop and save the value.
Code:
data want;
set test;
array medication_status[*] medication:;
/* Get the previous month of diagnosed_date and align it to
the first day of the month
*/
last_month = intnx('month', diagnosed_date, -1, 'B');
do i = 1 to dim(medication_status);
/* Get month from var name by removing alphabetic characters
and converting to a SAS date
*/
month = input(compress(vname(medication_status[i]),,'A'), yymmn6.);
/* Compare the var's month to the previous month */
if(month = last_month) then do;
medication_status_last_month = medication_status[i];
/* Don't keep checking now that we've found the right date */
leave;
end;
end;
drop i month last_month;
run;
CodePudding user response:
For this, the easiest way to do it if the months are consecutive is to just directly access the column based on the value of the diagnosed_date
. This only works, though, if everything is aligned right or named in a way where you can use the variable names in a list. If they're not in order, this will fail since your variable names cannot be easily listed in the single dash format.
data want;
set test;
array medications medication:;
diag_month = intck('Month','01JAN2020'd,diagnosed_date); *actually ONE LESS, but that is helpful for us so we leave it;
if 0 lt diag_month le dim(medications) then
medication_status = medications[diag_month];
run;
The first value in intck
is the first day of the month [or any day, but we'll go with first] in the first medication column. The diag_month is one offset (by design!) from its true value, since you want that one offset - if it bothers you that it's technically one off, name it differently or make it not one off (add one) then subtract later.
CodePudding user response:
Here's a different answer: the way I'd do it in real life.
The issue here is your data structure. You usually don't want to store data in arrays like this where the name of the variable means something. Not that it's a horrible thing to do or anything, but it just makes it a bit more work. Most likely the data doesn't truly originate that way - even if it comes to you that way, someone probably transformed it somewhere - but even if it did, it's very easy to transpose it so that it's in the right structure to easily do work with.
Here, we make a "medication date" level table. Then, it's trivial to join that table (either using SQL as I do, or using a merge
, or hash table, or format, or...) to get what you want - and it's very easy to adjust this.
First, we transform it to a vertical medication date level table:
data test_vert;
set test;
array medications medication:;
do _i = 1 to dim(medications);
medicationDate = input(substr(vname(medications[_i]),11,6)||'01',YYMMDD8.);
medicationValue= medications[_i];
output;
end;
format medicationDate ddmmyy10.;
keep ID Diagnosed_Date medicationDate medicationValue;
run;
Then we do a simple join to get the value we want.
proc sql;
create table want as
select test.id, test.diagnosed_date, test_vert.medicationDate, test_vert.medicationValue
from test
left join test_vert
on test.id=test_vert.id
and intnx('Month',test.diagnosed_date,-1,'b') = test_vert.medicationDate
;
quit;