Home > OS >  SQL Aggregate Over Range of Values
SQL Aggregate Over Range of Values

Time:09-30

I have T1 which is formatted as

STUDENT SCORE   DATE
1   6   2022-02-01
1   0   2022-03-12
1   5   2022-04-30
1   1   2022-04-30
1   1   2022-05-14
1   1   2022-05-19
1   8   2022-05-26
2   9   2022-01-02
2   10  2022-04-11
2   2   2022-04-12
2   0   2022-04-17
2   7   2022-05-08
2   4   2022-05-12
3   10  2022-01-09
3   2   2022-02-11
3   6   2022-03-16
3   3   2022-03-18
3   2   2022-04-02
3   9   2022-04-27
4   4   2022-02-24
4   0   2022-02-26
4   9   2022-02-28
4   2   2022-03-27
4   8   2022-04-02
4   4   2022-04-14
5   3   2022-01-28
5   5   2022-02-12
5   6   2022-02-18
5   0   2022-02-21
5   4   2022-04-05
XX  0.711094564 2022-02-28
XX  0.60584994  2022-03-31
XX  0.087965016 2022-04-30
YY  0.497937992 2022-02-28
YY  0.727796963 2022-03-31
YY  0.974471085 2022-04-30
YY  0.780187398 2022-05-30

First I wish to extract the UNIQUE values of DATE when STUDENT equals to XX and YY. There are some duplicates. These are the trimester ending dates.

Then I wish to SUM up the SCORE values for each STUDENT across the date ranges from Step 1:

enter image description here

And together I wish to generate this table T2

STUDENT SCORE   DATE.START  DATE.END
1   6   2022-01-01  2022-02-28
1   0   2022-03-01  2022-03-31
1   6   2022-04-01  2022-04-30
1   10  2022-05-01  2022-05-30
2   9   2022-01-01  2022-02-28
2   NA  2022-03-01  2022-03-31
2   12  2022-04-01  2022-04-30
2   11  2022-05-01  2022-05-30
3   12  2022-01-01  2022-02-28
3   9   2022-03-01  2022-03-31
3   11  2022-04-01  2022-04-30
3   NA  2022-05-01  2022-05-30
4   13  2022-01-01  2022-02-28
4   2   2022-03-01  2022-03-31
4   12  2022-04-01  2022-04-30
4   NA  2022-05-01  2022-05-30
5   14  2022-01-01  2022-02-28
5   NA  2022-03-01  2022-03-31
5   4   2022-04-01  2022-04-30
5   NA  2022-05-01  2022-05-30

And I try this,

WITH T2 AS(
SELECT DISTINCT(DATE),
COALESCE(LEAD(DATE) OVER(ORDER BY DATE) AS DATE.START,
ROW_NUMBER() OVER (ORDER BY DATE) AS DATE.END
FROM T1 WHERE (STUDENT = 'XX' OR STUDENT = 'YY')
)
SELECT STUDENT, SUM(SCORE), DATE.START, DATE.END
FROM T1 JOIN T2 ON T1.DATE.START BETWEEN DATE.START AND DATE.END

CodePudding user response:

Your date ranges are just month boundaries, so you can use ordinary GROUP BY

SELECT student, DATE_FORMAT(date, '%Y-%m') AS year_month, SUM(score) AS score
FROM T1
WHERE student IN ('XX', 'YY')
GROUP BY student, year_month

CodePudding user response:

Here we create a calendar divided to trimesters and join it with our table matching each date to the correct trimester in the calendar.

DECLARE @Year INT = '2019';
DECLARE @YearCnt INT = 4 ;
DECLARE @StartDate DATE = DATEFROMPARTS(@Year, 01, 01)
DECLARE @EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, @YearCnt, @StartDate));

;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n   4 FROM Cal
WHERE n < DATEDIFF(month, @StartDate, @EndDate)
),
FnlDt(d) AS
(
SELECT DATEADD(month, n, @StartDate) FROM Cal
),
FinalCte AS
(
SELECT
[Date] = CONVERT(DATE,d)

FROM FnlDt
)
select     STUDENT
          ,sum(score) as total_score
          ,semester_start
          ,semester_end
          
from       (
           select   date                                             as semester_start
                   ,lead(dateadd(day, -1, date)) over(order by date) as semester_end
           from     finalCte 
           ) t2 join t on t.date between semester_start and semester_end   
group by   STUDENT, semester_start, semester_end

           
STUDENT total_score semester_start semester_end
1 12 2022-01-01 2022-04-30
1 10 2022-05-01 2022-08-31
2 21 2022-01-01 2022-04-30
2 11 2022-05-01 2022-08-31
3 32 2022-01-01 2022-04-30
4 27 2022-01-01 2022-04-30
5 18 2022-01-01 2022-04-30

Fiddle

CodePudding user response:

In this one we're looking at score by month so we don't really need the bottom half of the information. In fact, it makes it better not to have it cause then we won't be able to save student and score as int.

select    student
         ,sum(score)                              as total_score
         ,max(dateadd(day, 1, eomonth(date, -1))) as "DATE.START"
         ,max(eomonth(date))                      as "DATE.END"
from      t
group  by student, year(date), month(date)
student total_score DATE.START DATE.END
1 6 2022-02-01 2022-02-28
1 0 2022-03-01 2022-03-31
1 6 2022-04-01 2022-04-30
1 10 2022-05-01 2022-05-31
2 9 2022-01-01 2022-01-31
2 12 2022-04-01 2022-04-30
2 11 2022-05-01 2022-05-31
3 10 2022-01-01 2022-01-31
3 2 2022-02-01 2022-02-28
3 9 2022-03-01 2022-03-31
3 11 2022-04-01 2022-04-30
4 13 2022-02-01 2022-02-28
4 2 2022-03-01 2022-03-31
4 12 2022-04-01 2022-04-30
5 3 2022-01-01 2022-01-31
5 11 2022-02-01 2022-02-28
5 4 2022-04-01 2022-04-30

Fiddle

CodePudding user response:

Unfortunately I do not have at this moment access to a MySQL, but only to a MSSQL server. So, I´ll present a solution for that platform. As I mentioned in the comments, the question is ill-posed, because it is not clear what is the definition of a "trimester", nor are all the datatypes very well defined (i.e., NA should be null or NA?). Nonetheless, I solved the query to present a solution identical to what is presented in the example. And it can be easily modified to accommodate any variations on the theme. Here it goes... (1st part and 2nd part):

Select Distinct Date From T1 Where Student = 'XX' or Student = 'YY' Order By date

Select t2.Student as Student,
   Coalesce(Cast(Sum(Case When (T1.date is null) Then null
                          When (T1.date >= t2.Di and t1.date <= t2.Df) Then t1.score 
                          Else 0
                     End) as varchar), 'NA') as Score, 
   t2.Di as DateStart, 
   t2.Df as DateEnd
   From (Select Student, Di, Df 
                From (Select Distinct Student From t1 Where Student <> 'XX' and Student <> 'YY') as ta,
                     (Select Distinct Di=Cast(Case When DatePart(Month, date) = 2 Then DateAdd(MONTH,DateDiff(Month,0,date),-31)
                                                   Else DateAdd(Month,DateDiff(Month,0,date),0) 
                                              End as Date), 
                                      Df=Date 
                                      from T1 where student = 'XX' or student = 'YY') as tb
         ) as t2
   Left Join T1
         On t2.student = T1.student and T1.date >= t2.di and T1.date <= t2.df 
   Group By t2.student, t2.di, t2.df
   Order By t2.student Asc, t2.di Asc

Well, anyone who runs the above code over a T1 table as shown in the question will receive the exact solution as presented in the question. Enjoy it.

  • Related