Using sqlite3 I'm trying to get the sequential number of groups of records until a new sequence begins, which should start at 1. The issue is the value I want to group by repeats, but I want the count to start over after a break in that value.
Data:
DROP TABLE IF EXISTS so_demo;
CREATE TABLE so_demo(
game_pk INT, game_ds DATE, away_team VARCHAR, home_team VARCHAR
);
INSERT INTO so_demo VALUES
(529410,'2018-03-29','ARI','COL'),
(529462,'2018-04-02','SD','COL'),
(529508,'2018-04-06','COL','ATL'),
(529556,'2018-04-09','COL','SD'),
(529590,'2018-04-12','WSH','COL'),
(530343,'2018-06-08','ARI','COL')
ALTER TABLE so_demo ADD home_away VARCHAR;
UPDATE so_demo SET home_away = 'home' WHERE home_team='COL';
UPDATE so_demo SET home_away = 'away' WHERE home_team <> 'COL';
This leaves me with :
game_pk | game_ds | away_team | home_team | home_away |
---|---|---|---|---|
529410 | 2018-03-29 | ARI | COL | home |
529462 | 2018-04-02 | SD | COL | home |
529508 | 2018-04-06 | COL | ATL | away |
529556 | 2018-04-09 | COL | SD | away |
529590 | 2018-04-12 | WSH | COL | home |
530343 | 2018-06-08 | ARI | COL | home |
I want a column for how long each "stand" (series of sequential home or away games) has been going. In the example above, I'm looking for: 1,2, 1,2, 1,2,
.
I've tried:
row_number()
WITH intermed AS (
SELECT *, (row_number() OVER (partition by home_away)) AS seq
FROM so_demo
ORDER BY game_pk
)
SELECT * FROM intermed
>>
529410,2018-03-29,ARI,COL,home,1
529462,2018-04-02,SD,COL,home,2
529508,2018-04-06,COL,ATL,away,1
529556,2018-04-09,COL,SD,away,2
529590,2018-04-12,WSH,COL,home,3
530343,2018-06-08,ARI,COL,home,4
Because I'm partitioning over home_away
the count does not restart anew after breaks and I get 1,2, 1,2, 3,4
. It's just a sequential count of those values.
lead
/lag
I can use to demarcate where the home-stands or road-stands break:
SELECT *,
away_team != lag(away_team, 1) OVER (order by game_pk ASC)
AND away_team = 'COL'
AS start_away,
home_team != lag(home_team, 1) OVER (ORDER BY game_pk ASC)
AND home_team = 'COL'
AS start_home
FROM so_demo
>>
game_pk game_ds away_team home_team home_away start_away start_home
529410 2018-03-29 ARI COL home 0
529462 2018-04-02 SD COL home 0 0
529508 2018-04-06 COL ATL away 1 0
529556 2018-04-09 COL SD away 0 0
529590 2018-04-12 WSH COL home 0 1
530343 2018-06-08 ARI COL home 0 0
But I can't figure out how to combine them.
CodePudding user response:
We mark every change, use count() over()
to make groups out of each consecutive run and then use row_number()
to number them by date.
select game_pk
,game_ds
,away_team
,home_team
,home_away
,row_number() over(partition by grp order by game_ds) as h_a_cnt
from (
select *
,count(chng) over(order by game_ds) as grp
from (
select *
,case when home_away != lag(home_away) over(order by game_ds) then 1 end as chng
from t
) t
) t
game_pk | game_ds | away_team | home_team | home_away | h_a_cnt |
---|---|---|---|---|---|
529410 | 2018-03-29 | ARI | COL | home | 1 |
529462 | 2018-04-02 | SD | COL | home | 2 |
529508 | 2018-04-06 | COL | ATL | away | 1 |
529556 | 2018-04-09 | COL | SD | away | 2 |
529590 | 2018-04-12 | WSH | COL | home | 1 |
530343 | 2018-06-08 | ARI | COL | home | 2 |