I couldn't briefly explain the problem so I'll try to explain it this way. Let's say I have a table similar to the one below.
How do I get the total number of days in October per student that that student has at least 1 book checked out?
Please note that a single student can check out more than 1 book at a time which cause the overlapping dates.
Student | Book | Date_Borrowed | Date_Returned |
---|---|---|---|
David | A Thousand Splendid Suns | 01 Oct 2021 | 05 Oct 2021 |
David | Jane Eyre | 09 Oct 2021 | 13 Oct 2021 |
David | Please Look After Mom | 21 Oct 2021 | 29 Oct 2021 |
Fiona | Sense and Sensibility | 05 Oct 2021 | 14 Oct 2021 |
Fiona | The Girl Who Saved the King of Sweden | 05 Oct 2021 | 14 Oct 2021 |
Fiona | A Fort of Nine Towers | 02 Oct 2021 | 17 Oct 2021 |
Fiona | One Hundred Years of Solitude | 20 Oct 2021 | 30 Oct 2021 |
Fiona | The Unbearable Lightness of Being | 20 Oct 2021 | 30 Oct 2021 |
Greg | Fahrenheit 451 | 06 Oct 2021 | 11 Oct 2021 |
Greg | One Hundred Years of Solitude | 10 Oct 2021 | 17 Oct 2021 |
Greg | Please Look After Mom | 15 Oct 2021 | 21 Oct 2021 |
Greg | 4 3 2 1 | 20 Oct 2021 | 27 Oct 2021 |
Greg | The Girl Who Saved the King of Sweden | 27 Oct 2021 | 03 Nov 2021 |
Marcus | Fahrenheit 451 | 01 Oct 2021 | 04 Oct 2021 |
Marcus | Nectar in a Sieve | 15 Oct 2021 | 15 Oct 2021 |
Marcus | Please Look After Mom | 30 Oct 2021 | 31 Oct 2021 |
Priya | Like Water for Chocolate | 02 Oct 2021 | 21 Oct 2021 |
Priya | Fahrenheit 451 | 21 Oct 2021 | 22 Oct 2021 |
Sasha | Baudolino | 03 Oct 2021 | 29 Oct 2021 |
Sasha | A Thousand Splendid Suns | 07 Oct 2021 | 16 Oct 2021 |
Sasha | A Fort of Nine Towers | 26 Oct 2021 | 01 Nov 2021 |
Thanks in advance!
CodePudding user response:
Using the data step, you can expand each date into a long format. From there, you can use SQL to do a simple count by student after removing overlapping dates.
data foo;
set have;
do date = date_borrowed to date_returned;
output;
end;
keep student date;
format date date9.;
run;
This gets us a long table of all the dates with at least one book checked out for each student.
student date
David 01OCT2021
David 02OCT2021
David 03OCT2021
David 04OCT2021
David 05OCT2021
David 09OCT2021
...
Now we need to remove the overlapping dates.
proc sort data=foo nodupkey;
by student date;
run;
From here, we can do a simple SQL count per student.
proc sql noprint;
create table want as
select student
, intnx('month', date, 0, 'B') as month format=monyy7.
, count(*) as days_checked_out
from foo
where calculated month = '01OCT2021'd
group by student, calculated month
;
quit;
Output:
student month days_checked_out
David OCT2021 19
Fiona OCT2021 27
Greg OCT2021 26
Marcus OCT2021 7
Priya OCT2021 21
Sasha OCT2021 29
CodePudding user response:
An easy way is to make a temporary array with one variable for each day in the time period you want to count. Then just use a do loop to set the variables representing those days to 1. When you have reached the last record for a student then take the sum to find the number of days covered.
First let's convert your posted table into a dataset.
data have;
infile cards dsd dlm='|' truncover;
input Student :$20. Book :$100. (Date_Borrowed Date_Returned) (:date.);
format Date_Borrowed Date_Returned date11.;
cards;
David|A Thousand Splendid Suns|01 Oct 2021|05 Oct 2021
David|Jane Eyre|09 Oct 2021|13 Oct 2021
David|Please Look After Mom|21 Oct 2021|29 Oct 2021
Fiona|Sense and Sensibility|05 Oct 2021|14 Oct 2021
Fiona|The Girl Who Saved the King of Sweden|05 Oct 2021|14 Oct 2021
Fiona|A Fort of Nine Towers|02 Oct 2021|17 Oct 2021
Fiona|One Hundred Years of Solitude|20 Oct 2021|30 Oct 2021
Fiona|The Unbearable Lightness of Being|20 Oct 2021|30 Oct 2021
Greg|Fahrenheit 451|06 Oct 2021|11 Oct 2021
Greg|One Hundred Years of Solitude|10 Oct 2021|17 Oct 2021
Greg|Please Look After Mom|15 Oct 2021|21 Oct 2021
Greg|4 3 2 1|20 Oct 2021|27 Oct 2021
Greg|The Girl Who Saved the King of Sweden|27 Oct 2021|03 Nov 2021
Marcus|Fahrenheit 451|01 Oct 2021|04 Oct 2021
Marcus|Nectar in a Sieve|15 Oct 2021|15 Oct 2021
Marcus|Please Look After Mom|30 Oct 2021|31 Oct 2021
Priya|Like Water for Chocolate|02 Oct 2021|21 Oct 2021
Priya|Fahrenheit 451|21 Oct 2021|22 Oct 2021
Sasha|Baudolino|03 Oct 2021|29 Oct 2021
Sasha|A Thousand Splendid Suns|07 Oct 2021|16 Oct 2021
Sasha|A Fort of Nine Towers|26 Oct 2021|01 Nov 2021
;
Now we can use BY group processing in a data step to aggregate per student. We can set the upper and lower index for the array to be the values SAS uses to represent those days. Temporary arrays are automatically retained across observations, we just need to clear it out when we start a new student.
The SAS compiler does not expect to see a date literal as the index boundaries for an array so we can use %SYSEVALF() to convert the date literal to the integer it represents.
data want;
set have;
by student ;
array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
if first.student then call missing(of october[*]);
do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
october[date]=1;
end;
if last.student;
days = sum(0, of october[*]);
keep student days;
run;
Results:
Obs Student days
1 David 19
2 Fiona 27
3 Greg 26
4 Marcus 7
5 Priya 21
6 Sasha 29
You could also modify it slightly to not only count the number of "covered" (or unique) days, but also the total number of "book" days.
data want;
set have;
by student ;
array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
if first.student then call missing(of october[*]);
do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
october[date]=sum(october[date],1);
end;
if last.student;
unique_days = n(of october[*]);
book_days = sum(0,of october[*]);
keep student unique_days book_days;
run;
Results:
unique_ book_
Obs Student days days
1 David 19 19
2 Fiona 27 58
3 Greg 26 34
4 Marcus 7 7
5 Priya 21 22
6 Sasha 29 43