SQL row_number with a condition-CodePudding

I want to configure row_number with a case condition. To look on "time_diffs" column and check - if there 1's go one by one, than it's a one group. If there 0's, than each 0 is the one group by itself. And after each itteration between 1's and 0's the row result will grow on 1.

select session_id, 
    player_id, 
        country, 
        start_time, 
        end_time,       
        case when timestampdiff(minute, 
                                lag(end_time, 1) over(partition by player_id order by end_time)
                               , start_time) < 5 then 1
             when timestampdiff(minute, end_time
                   , lead(start_time, 1) over(partition by player_id order by start_time)) < 5 then 1
        else 0
        end as time_diffs
        /* , here is some new code with an expected result */
        
 from game_sessions
 where 1=1
 and player_id = 1
 order by player_id, start_time

The result of the current query:

session_id	player_id	country	start_time	end_time	time_diffs
1	1	UK	01.01.2021 00:01	01.01.2021 00:10	1
2	1	UK	01.01.2021 00:12	01.01.2021 01:24	1
13	1	UK	01.01.2021 01:27	01.01.2021 01:50	1
3	1	UK	01.01.2021 10:01	01.01.2021 15:10	0
16	1	UK	01.01.2021 17:10	01.01.2021 17:20	1
17	1	UK	01.01.2021 17:22	01.01.2021 17:55	1
54	1	UK	01.01.2021 18:15	01.01.2021 18:35	0
32	1	UK	01.01.2021 18:55	01.01.2021 19:35	0

What I expect to see with a new column added to the current query:

session_id	player_id	country	start_time	end_time	time_diffs	expected_result
1	1	UK	01.01.2021 00:01	01.01.2021 00:10	1	1
2	1	UK	01.01.2021 00:12	01.01.2021 01:24	1	1
13	1	UK	01.01.2021 01:27	01.01.2021 01:50	1	1
3	1	UK	01.01.2021 10:01	01.01.2021 15:10	0	2
16	1	UK	01.01.2021 17:10	01.01.2021 17:20	1	3
17	1	UK	01.01.2021 17:22	01.01.2021 17:55	1	3
54	1	UK	01.01.2021 18:15	01.01.2021 18:35	0	4
32	1	UK	01.01.2021 18:55	01.01.2021 19:35	0	5

CodePudding user response：

This is a type of [Gaps and islands problem], and will require a few Windowed functions (and subqueries) to get your desired result, the first step is to work out your gaps and your islands, which you can do with the use of two row_numbers, one having an additional partition by:

SELECT  *,
        ROW_NUMBER() OVER (PARTITION BY player_id ORDER BY start_time)
            - ROW_NUMBER() OVER (PARTITION BY player_id, time_diffs ORDER BY start_time) AS GroupingSet
FROM    game_sessions;

N.B. For this query and all other queries I have taken the step of simplifying your entire query to include the field time_diffs in the dataset to shorten the actual query

This gives:

session_id	player_id	country	start_time	end_time	time_diffs	GroupingSet
1	1	UK	2021-01-01 00:01:00	2021-01-01 00:10:00	1	0
2	1	UK	2021-01-01 00:12:00	2021-01-01 01:24:00	1	0
13	1	UK	2021-01-01 01:27:00	2021-01-01 01:50:00	1	0
3	1	UK	2021-01-01 10:01:00	2021-01-01 15:10:00	0	3
16	1	UK	2021-01-01 17:10:00	2021-01-01 17:20:00	1	1
17	1	UK	2021-01-01 17:22:00	2021-01-01 17:55:00	1	1
54	1	UK	2021-01-01 18:15:00	2021-01-01 18:35:00	0	5
32	1	UK	2021-01-01 18:55:00	2021-01-01 19:35:00	0	5

What you can see here is that the "GroupingSet" column changes each time your time_diff changes, this is the basis for identifying your islands (consecutive groups of the same value).

For your output you then need a couple of additional windowed functions, firstly you need to get the minimum start time per group, since you want to consider every row a unique group for time_diffs = 0, you need the following expression:

IF(time_diffs=1,MIN(start_time) OVER (PARTITION BY player_id, p.GroupingSet),start_time)

Adding this column then gives:

session_id	player_id	country	start_time	end_time	time_diffs	GroupingSet	GroupStart
1	1	UK	2021-01-01 00:01:00	2021-01-01 00:10:00	1	0	2021-01-01 00:01:00
2	1	UK	2021-01-01 00:12:00	2021-01-01 01:24:00	1	0	2021-01-01 00:01:00
13	1	UK	2021-01-01 01:27:00	2021-01-01 01:50:00	1	0	2021-01-01 00:01:00
3	1	UK	2021-01-01 10:01:00	2021-01-01 15:10:00	0	3	2021-01-01 10:01:00
16	1	UK	2021-01-01 17:10:00	2021-01-01 17:20:00	1	1	2021-01-01 17:10:00
17	1	UK	2021-01-01 17:22:00	2021-01-01 17:55:00	1	1	2021-01-01 17:10:00
54	1	UK	2021-01-01 18:15:00	2021-01-01 18:35:00	0	5	2021-01-01 18:15:00
32	1	UK	2021-01-01 18:55:00	2021-01-01 19:35:00	0	5	2021-01-01 18:55:00

Finally, you can use this MinStart column as the basis for DENSE_RANK(), giving a final query of

SELECT  p.session_id,
        p.player_id,
        p.country,
        p.start_time,
        p.end_time,
        p.time_diffs,
        DENSE_RANK() OVER(PARTITION BY player_id ORDER BY p.GroupStart) AS ExpectedOutput
FROM
        (
            SELECT  *, IF(time_diffs = 0,start_time,MIN(start_time) OVER (PARTITION BY player_id, p.GroupingSet)) AS GroupStart
            FROM
                    (
                        SELECT  *,
                                ROW_NUMBER() OVER (PARTITION BY player_id ORDER BY start_time)
                                    - ROW_NUMBER() OVER (PARTITION BY player_id, time_diffs ORDER BY start_time) AS GroupingSet
                        FROM    game_sessions
                    ) AS p
        ) AS p
ORDER BY
        player_id, start_time;

A potentially simpler alternative is to identify rows where you don't want to increment the count, and return 0 otherwise 1 i.e

IF(time_diffs=1 AND LAG(time_diffs,1,0) OVER(PARTITION BY player_id ORDER BY start_time)=1,0,1)

Then sum this column:

SELECT  p.session_id,
        p.player_id,
        p.country,
        p.start_time,
        p.end_time,
        p.time_diffs,
        SUM(TDChanges) OVER(PARTITION BY player_id ORDER BY p.time_start) AS ExpectedOutput
FROM
        (
            SELECT  *,
                    IIF(time_diffs=1 AND LAG(time_diffs,1,0) OVER(PARTITION BY player_id ORDER BY time_start)=1,0,1) AS TDChanges
            FROM    game_sessions
        ) AS p
ORDER BY
        player_id, start_time;

Both queries give your expected output - Examples on db<>fidle