Home > Software engineering >  The way sum over function work in SQL Server
The way sum over function work in SQL Server

Time:06-26

Given this table

CREATE TABLE Table1
(
    [Classroom] int,
    [CourseName] varchar(8),
    [Lesson] varchar(9),
    [StartTime] char(4),
    [EndTime] char(4)
);
    
INSERT INTO Table1
    ([Classroom], [CourseName], [Lesson], [StartTime], [EndTime])
VALUES
    (1001, 'Course 1', 'Lesson 1', '0800', '0900'),
    (1001, 'Course 1', 'Lesson 2', '0900', '1000'),
    (1001, 'Course 1', 'Lesson 3', '1000', '1100'),
    (1001, 'Course 1', 'Lesson 6', '1100', '1200'),
    (1001, 'Course 2', 'Lesson 10', '1100', '1200'),
    (1001, 'Course 2', 'Lesson 11', '1200', '1300'),
    (1001, 'Course 1', 'Lesson 4', '1300', '1400'),
    (1001, 'Course 1', 'Lesson 5', '1400', '1500');

And my query

WITH A AS 
(
    SELECT 
        ClassRoom,
        CourseName,
        StartTime,
        EndTime,
        PrevCourse = LAG(CourseName, 1, CourseName) OVER (ORDER BY StartTime)
    FROM   
        Table1
), B AS 
(
    SELECT 
        ClassRoom,
        CourseName,
        StartTime, EndTime,
        Ranker = SUM(CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END) 
                    OVER (ORDER BY StartTime, CourseName)
    FROM   
        A
)
SELECT B.* 
FROM B;

I get this result:

ClassRoom CourseName StartTime EndTime Ranker
---------------------------------------------
1001      Course 1   0800   0900    0
1001      Course 1   0900   1000    0
1001      Course 1   1000   1100    0
1001      Course 1   1100   1200    0
1001      Course 2   1100   1200    1
1001      Course 2   1200   1300    1
1001      Course 1   1300   1400    2
1001      Course 1   1400   1500    2

Please focus on column ranker. If I do not misunderstand, at the every first row where current course is different to previous course, then sum(1); the next rows, where current course == previous course, then sum(0), so my expectation of the ranker should be: (0,0,0,0), (1,1), (1,1) but it give me (0,0,0,0), (1,1), (2,2).

Why at the end I get (2, 2) ? Am I missing something?

CodePudding user response:

The expression:

CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END

returns 1 when CourseName is not the same as PrevCourse.

You could see it better if you added one more column inside B:

B AS (
  SELECT ClassRoom
       , CourseName
       , StartTime
       , EndTime
       , CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END flag -- check this
       , Ranker = SUM(CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END)
                OVER (ORDER BY StartTime, CourseName)
  FROM   A
)

If you want to get the results as (0,0,0,0), (1,1), (1,1) you should add a PARTITION BY clause inside OVER for the column Ranker:

Ranker = SUM(CASE WHEN CourseName = PrevCourse THEN 0 ELSE 1 END)
         OVER (PARTITION BY CourseName ORDER BY StartTime)

See the demo.

CodePudding user response:

It seems that, because you have an underlying logic of comparing different coursenames, you think the sum will do that too. It will not. For the Starttime=1300 row, your case expression is evaluated on all 7 rows from 0900 to 1300. Both 1100 and 1300 have the case=1, so the sum returns 2.

  • Related