Home > Software design >  Query count rows sequentially until a leading row value changes
Query count rows sequentially until a leading row value changes

Time:10-07

Using sqlite3 I'm trying to get the sequential number of groups of records until a new sequence begins, which should start at 1. The issue is the value I want to group by repeats, but I want the count to start over after a break in that value.

Data:

DROP TABLE IF EXISTS so_demo;
CREATE TABLE so_demo(
    game_pk INT, game_ds DATE, away_team VARCHAR, home_team VARCHAR
);
INSERT INTO so_demo VALUES
(529410,'2018-03-29','ARI','COL'),
(529462,'2018-04-02','SD','COL'),
(529508,'2018-04-06','COL','ATL'),
(529556,'2018-04-09','COL','SD'),
(529590,'2018-04-12','WSH','COL'),
(530343,'2018-06-08','ARI','COL')
ALTER TABLE so_demo ADD home_away VARCHAR;
UPDATE so_demo SET home_away = 'home' WHERE home_team='COL';
UPDATE so_demo SET home_away = 'away' WHERE home_team <> 'COL';

This leaves me with :

game_pk game_ds away_team home_team home_away
529410 2018-03-29 ARI COL home
529462 2018-04-02 SD COL home
529508 2018-04-06 COL ATL away
529556 2018-04-09 COL SD away
529590 2018-04-12 WSH COL home
530343 2018-06-08 ARI COL home

I want a column for how long each "stand" (series of sequential home or away games) has been going. In the example above, I'm looking for: 1,2, 1,2, 1,2,.

I've tried:

row_number()

WITH intermed AS (
    SELECT *, (row_number() OVER (partition by home_away)) AS seq
    FROM so_demo
    ORDER BY game_pk
)
SELECT * FROM intermed
>>
529410,2018-03-29,ARI,COL,home,1
529462,2018-04-02,SD,COL,home,2
529508,2018-04-06,COL,ATL,away,1
529556,2018-04-09,COL,SD,away,2
529590,2018-04-12,WSH,COL,home,3
530343,2018-06-08,ARI,COL,home,4

Because I'm partitioning over home_away the count does not restart anew after breaks and I get 1,2, 1,2, 3,4. It's just a sequential count of those values.

lead/lag

I can use to demarcate where the home-stands or road-stands break:

SELECT *,
      away_team != lag(away_team, 1) OVER (order by game_pk ASC)
          AND away_team = 'COL'
          AS start_away,
             home_team != lag(home_team, 1) OVER (ORDER BY game_pk ASC)
          AND home_team = 'COL'
          AS start_home
FROM so_demo
>>
game_pk game_ds away_team   home_team   home_away   start_away  start_home
529410  2018-03-29  ARI COL home    0   
529462  2018-04-02  SD  COL home    0   0
529508  2018-04-06  COL ATL away    1   0
529556  2018-04-09  COL SD  away    0   0
529590  2018-04-12  WSH COL home    0   1
530343  2018-06-08  ARI COL home    0   0

But I can't figure out how to combine them.

CodePudding user response:

We mark every change, use count() over() to make groups out of each consecutive run and then use row_number() to number them by date.

select  game_pk
       ,game_ds
       ,away_team 
       ,home_team 
       ,home_away
       ,row_number() over(partition by grp order by game_ds) as h_a_cnt
from    (
        select  *
                ,count(chng) over(order by game_ds) as grp
        from    (
                select  *
                        ,case when home_away != lag(home_away) over(order by game_ds) then 1 end as chng
                from    t
                ) t
        ) t
game_pk game_ds away_team home_team home_away h_a_cnt
529410 2018-03-29 ARI COL home 1
529462 2018-04-02 SD COL home 2
529508 2018-04-06 COL ATL away 1
529556 2018-04-09 COL SD away 2
529590 2018-04-12 WSH COL home 1
530343 2018-06-08 ARI COL home 2

Fiddle

  • Related