Home > Net >  How to recursively calculate rows based on previous rows in Snowflake?
How to recursively calculate rows based on previous rows in Snowflake?

Time:04-09

I'm trying to build a performance score on each player in my table and use this to run some analysis. I've tried to use the Elo ranking using the player's score in the game vs themselves in the prior game. The game is a time-trial race so the faster they complete the better their performance. I simulate a win (1) when the current race is faster than their previous race, a loss (0) when it is slower and draw (.5) when the timing difference is 0.

There are two components to Elo's algorithm...the Outcome Estimation for all players (two in my case - Ea, Eb) and the ranking update (Ra, Rb).

I have the current setup to simulate one calculation

select
    player_id,
    race_ind,
    k,
    sa,
    sb,
    pre_ra,
    pre_rb,
    div0(1, 1   pow(10,(pre_rb-pre_ra)/400)) ea,
    div0(1, 1   pow(10,(pre_ra-pre_rb)/400)) eb,
    round(pre_ra   k * (sa-ea)) post_ra,
    round(pre_rb   k * (sb-eb)) post_rb
from
    t

Which returns the below.

player_id race_ind k sa sb pre_ra pre_rb ea eb post_ra post_rb
xxxxx 1 24 1 0 1000 1000 0.5 0.5 1012 988
xxxxx 2 24 1 0 null null null null null null
xxxxx 3 24 0 1 null null null null null null
xxxxx 4 24 1 0 null null null null null null

But I need to write it in a way that on the subsequent row pre_rb and pre_ra is updated with the previous row's post_ra and post_rb after calculation so the result should return this table instead

player_id race_ind k sa sb pre_ra pre_rb ea eb post_ra post_rb
xxxxx 1 24 1 0 1000 1000 0.5 0.5 1012 988
xxxxx 2 24 1 0 1012 988 0.5344839447 0.4655160553 1023 977
xxxxx 3 24 0 1 1023 977 0.5658152031 0.4341847969 1009 991
xxxxx 4 24 1 0 1009 991 0.5258809309 0.4741190691 1020 980

CodePudding user response:

I may not understand which fields are being populated from what previous fields in the previous row, but I wanted to point out that Snowflake breaks convention by allowing you to reference the output of calculated fields within the same select statement, assuming those fields come before the field with a calculation referencing them in the select list. For Example:

select
    1 as a,
    2 as b,
    a   b as c,
    c   10 as d;

Produces:

A B C D
1 2 3 13

Given this, assuming pre_ra & pre_rb come from the previous row, LAG should work for them.

As long as post_ra & post_rb are above ea and eb in the select list, ea and eb should calulate correctly.

So something structured like this may work:

select
    player_id,
    race_ind,
    k,
    sa,
    sb,
    lag(<expr> [ , <offset> , <default> ]) ignore nulls over (partition by [something] order by [something])  as pre_ra,
    lag(<expr> [ , <offset> , <default> ]) ignore nulls over (partition by [something] order by [something])  as pre_rb,
    div0(1, 1   pow(10,(pre_rb - pre_ra) / 400)) ea,
    div0(1, 1   pow(10,(pre_ra - pre_rb) / 400)) eb,
    round(pre_ra   k * (sa - ea)) post_ra,
    round(pre_rb   k * (sb - eb)) post_rb,

from
    t;

CodePudding user response:

So reaping the point that values calculated by this process are meaningless. Here is how you can do it. I stripped the formula down to avoid math errors that occur in the CTE due to ambiguous values selection occurring, which can be avoided but each values needs to only refer to the r or d values and not the prior calculated intermediate values of the "same row"

So with this data:


create table data (player_id number, race_ind number, k float, sa float);

insert into data VALUES 
    (100, 1, 24, 1),
    (100, 2, 24, 1),
    (100, 3, 24, 0),
    (100, 4, 24, 1);

this recursive CTE

with RECURSIVE rec_cte as (
    select 
        d.player_id, 
        d.race_ind, 
        d.sa, 
        round(1000::float   k * (d.sa - div0(1, 1   pow(10,(0::float )/400)))) post_ra,
        round(1000::float   k * (1-d.sa - div0(1, 1   pow(10,(0::float )/400)))) post_rb
    from data as d
    where race_ind = 1

    UNION ALL

    select 
        d.player_id, 
        d.race_ind, 
        d.sa, 
        round(r.post_ra   k::float * (d.sa-div0(1, 1   pow(10,(r.post_rb-r.post_ra)/400)))) as post_ra,
        round(r.post_rb   k::float * (1-d.sa-div0(1, 1   pow(10,(r.post_ra-r.post_rb)/400)))) as post_rb
    from rec_cte as r
    join data as d 
    where r.player_id = d.player_id and r.race_ind   1  = d.race_ind
)
select *
from rec_cte

gives:

PLAYER_ID RACE_IND SA POST_RA POST_RB
100 1 1 1,012 988
100 2 1 1,023 977
100 3 0 1,009 991
100 4 1 1,020 980
  • Related