Home > Enterprise >  Get the number of unique days with overlapping dates (in SAS)
Get the number of unique days with overlapping dates (in SAS)

Time:11-03

I couldn't briefly explain the problem so I'll try to explain it this way. Let's say I have a table similar to the one below.

How do I get the total number of days in October per student that that student has at least 1 book checked out?

Please note that a single student can check out more than 1 book at a time which cause the overlapping dates.

Student Book Date_Borrowed Date_Returned
David A Thousand Splendid Suns 01 Oct 2021 05 Oct 2021
David Jane Eyre 09 Oct 2021 13 Oct 2021
David Please Look After Mom 21 Oct 2021 29 Oct 2021
Fiona Sense and Sensibility 05 Oct 2021 14 Oct 2021
Fiona The Girl Who Saved the King of Sweden 05 Oct 2021 14 Oct 2021
Fiona A Fort of Nine Towers 02 Oct 2021 17 Oct 2021
Fiona One Hundred Years of Solitude 20 Oct 2021 30 Oct 2021
Fiona The Unbearable Lightness of Being 20 Oct 2021 30 Oct 2021
Greg Fahrenheit 451 06 Oct 2021 11 Oct 2021
Greg One Hundred Years of Solitude 10 Oct 2021 17 Oct 2021
Greg Please Look After Mom 15 Oct 2021 21 Oct 2021
Greg 4 3 2 1 20 Oct 2021 27 Oct 2021
Greg The Girl Who Saved the King of Sweden 27 Oct 2021 03 Nov 2021
Marcus Fahrenheit 451 01 Oct 2021 04 Oct 2021
Marcus Nectar in a Sieve 15 Oct 2021 15 Oct 2021
Marcus Please Look After Mom 30 Oct 2021 31 Oct 2021
Priya Like Water for Chocolate 02 Oct 2021 21 Oct 2021
Priya Fahrenheit 451 21 Oct 2021 22 Oct 2021
Sasha Baudolino 03 Oct 2021 29 Oct 2021
Sasha A Thousand Splendid Suns 07 Oct 2021 16 Oct 2021
Sasha A Fort of Nine Towers 26 Oct 2021 01 Nov 2021

Thanks in advance!

CodePudding user response:

Using the data step, you can expand each date into a long format. From there, you can use SQL to do a simple count by student after removing overlapping dates.

data foo;
    set have;

    do date = date_borrowed to date_returned;
        output;
    end;

    keep student date;

    format date date9.;
run;

This gets us a long table of all the dates with at least one book checked out for each student.

student date
David   01OCT2021
David   02OCT2021
David   03OCT2021
David   04OCT2021
David   05OCT2021
David   09OCT2021
...

Now we need to remove the overlapping dates.

proc sort data=foo nodupkey;
    by student date;
run;

From here, we can do a simple SQL count per student.

proc sql noprint;
    create table want as
        select student
             , intnx('month', date, 0, 'B') as month format=monyy7.
             , count(*) as days_checked_out
        from foo
        where calculated month = '01OCT2021'd
        group by student, calculated month
    ;
quit;

Output:

student month     days_checked_out
David   OCT2021   19
Fiona   OCT2021   27
Greg    OCT2021   26
Marcus  OCT2021   7
Priya   OCT2021   21
Sasha   OCT2021   29

CodePudding user response:

An easy way is to make a temporary array with one variable for each day in the time period you want to count. Then just use a do loop to set the variables representing those days to 1. When you have reached the last record for a student then take the sum to find the number of days covered.

First let's convert your posted table into a dataset.

data have;
  infile cards dsd dlm='|' truncover;
  input Student :$20. Book :$100. (Date_Borrowed Date_Returned) (:date.);
  format Date_Borrowed Date_Returned date11.;
cards;
David|A Thousand Splendid Suns|01 Oct 2021|05 Oct 2021
David|Jane Eyre|09 Oct 2021|13 Oct 2021
David|Please Look After Mom|21 Oct 2021|29 Oct 2021
Fiona|Sense and Sensibility|05 Oct 2021|14 Oct 2021
Fiona|The Girl Who Saved the King of Sweden|05 Oct 2021|14 Oct 2021
Fiona|A Fort of Nine Towers|02 Oct 2021|17 Oct 2021
Fiona|One Hundred Years of Solitude|20 Oct 2021|30 Oct 2021
Fiona|The Unbearable Lightness of Being|20 Oct 2021|30 Oct 2021
Greg|Fahrenheit 451|06 Oct 2021|11 Oct 2021
Greg|One Hundred Years of Solitude|10 Oct 2021|17 Oct 2021
Greg|Please Look After Mom|15 Oct 2021|21 Oct 2021
Greg|4 3 2 1|20 Oct 2021|27 Oct 2021
Greg|The Girl Who Saved the King of Sweden|27 Oct 2021|03 Nov 2021
Marcus|Fahrenheit 451|01 Oct 2021|04 Oct 2021
Marcus|Nectar in a Sieve|15 Oct 2021|15 Oct 2021
Marcus|Please Look After Mom|30 Oct 2021|31 Oct 2021
Priya|Like Water for Chocolate|02 Oct 2021|21 Oct 2021
Priya|Fahrenheit 451|21 Oct 2021|22 Oct 2021
Sasha|Baudolino|03 Oct 2021|29 Oct 2021
Sasha|A Thousand Splendid Suns|07 Oct 2021|16 Oct 2021
Sasha|A Fort of Nine Towers|26 Oct 2021|01 Nov 2021
;

Now we can use BY group processing in a data step to aggregate per student. We can set the upper and lower index for the array to be the values SAS uses to represent those days. Temporary arrays are automatically retained across observations, we just need to clear it out when we start a new student.

The SAS compiler does not expect to see a date literal as the index boundaries for an array so we can use %SYSEVALF() to convert the date literal to the integer it represents.

data want;
  set have;
  by student ;
  array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
  if first.student then call missing(of october[*]);
  do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
    october[date]=1;
  end;
  if last.student;
  days = sum(0, of october[*]);
  keep student days;
run;

Results:

Obs    Student    days

 1     David       19
 2     Fiona       27
 3     Greg        26
 4     Marcus       7
 5     Priya       21
 6     Sasha       29

You could also modify it slightly to not only count the number of "covered" (or unique) days, but also the total number of "book" days.

data want;
  set have;
  by student ;
  array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
  if first.student then call missing(of october[*]);
  do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
    october[date]=sum(october[date],1);
  end;
  if last.student;
  unique_days = n(of october[*]);
  book_days = sum(0,of october[*]);
  keep student unique_days book_days;
run;

Results:

                  unique_    book_
Obs    Student      days      days

 1     David         19        19
 2     Fiona         27        58
 3     Greg          26        34
 4     Marcus         7         7
 5     Priya         21        22
 6     Sasha         29        43
  • Related