I'm trying to build a performance score on each player in my table and use this to run some analysis. I've tried to use the Elo ranking using the player's score in the game vs themselves in the prior game. The game is a time-trial race so the faster they complete the better their performance. I simulate a win (1) when the current race is faster than their previous race, a loss (0) when it is slower and draw (.5) when the timing difference is 0.
There are two components to Elo's algorithm...the Outcome Estimation for all players (two in my case - Ea, Eb) and the ranking update (Ra, Rb).
I have the current setup to simulate one calculation
select
player_id,
race_ind,
k,
sa,
sb,
pre_ra,
pre_rb,
div0(1, 1 pow(10,(pre_rb-pre_ra)/400)) ea,
div0(1, 1 pow(10,(pre_ra-pre_rb)/400)) eb,
round(pre_ra k * (sa-ea)) post_ra,
round(pre_rb k * (sb-eb)) post_rb
from
t
Which returns the below.
player_id | race_ind | k | sa | sb | pre_ra | pre_rb | ea | eb | post_ra | post_rb |
---|---|---|---|---|---|---|---|---|---|---|
xxxxx | 1 | 24 | 1 | 0 | 1000 | 1000 | 0.5 | 0.5 | 1012 | 988 |
xxxxx | 2 | 24 | 1 | 0 | null | null | null | null | null | null |
xxxxx | 3 | 24 | 0 | 1 | null | null | null | null | null | null |
xxxxx | 4 | 24 | 1 | 0 | null | null | null | null | null | null |
But I need to write it in a way that on the subsequent row pre_rb and pre_ra is updated with the previous row's post_ra and post_rb after calculation so the result should return this table instead
player_id | race_ind | k | sa | sb | pre_ra | pre_rb | ea | eb | post_ra | post_rb |
---|---|---|---|---|---|---|---|---|---|---|
xxxxx | 1 | 24 | 1 | 0 | 1000 | 1000 | 0.5 | 0.5 | 1012 | 988 |
xxxxx | 2 | 24 | 1 | 0 | 1012 | 988 | 0.5344839447 | 0.4655160553 | 1023 | 977 |
xxxxx | 3 | 24 | 0 | 1 | 1023 | 977 | 0.5658152031 | 0.4341847969 | 1009 | 991 |
xxxxx | 4 | 24 | 1 | 0 | 1009 | 991 | 0.5258809309 | 0.4741190691 | 1020 | 980 |
CodePudding user response:
I may not understand which fields are being populated from what previous fields in the previous row, but I wanted to point out that Snowflake breaks convention by allowing you to reference the output of calculated fields within the same select statement, assuming those fields come before the field with a calculation referencing them in the select list. For Example:
select
1 as a,
2 as b,
a b as c,
c 10 as d;
Produces:
A | B | C | D |
---|---|---|---|
1 | 2 | 3 | 13 |
Given this, assuming pre_ra
& pre_rb
come from the previous row, LAG
should work for them.
As long as post_ra
& post_rb
are above ea
and eb
in the select list, ea
and eb
should calulate correctly.
So something structured like this may work:
select
player_id,
race_ind,
k,
sa,
sb,
lag(<expr> [ , <offset> , <default> ]) ignore nulls over (partition by [something] order by [something]) as pre_ra,
lag(<expr> [ , <offset> , <default> ]) ignore nulls over (partition by [something] order by [something]) as pre_rb,
div0(1, 1 pow(10,(pre_rb - pre_ra) / 400)) ea,
div0(1, 1 pow(10,(pre_ra - pre_rb) / 400)) eb,
round(pre_ra k * (sa - ea)) post_ra,
round(pre_rb k * (sb - eb)) post_rb,
from
t;
CodePudding user response:
So reaping the point that values calculated by this process are meaningless. Here is how you can do it. I stripped the formula down to avoid math errors that occur in the CTE due to ambiguous values selection occurring, which can be avoided but each values needs to only refer to the r
or d
values and not the prior calculated intermediate values of the "same row"
So with this data:
create table data (player_id number, race_ind number, k float, sa float);
insert into data VALUES
(100, 1, 24, 1),
(100, 2, 24, 1),
(100, 3, 24, 0),
(100, 4, 24, 1);
this recursive CTE
with RECURSIVE rec_cte as (
select
d.player_id,
d.race_ind,
d.sa,
round(1000::float k * (d.sa - div0(1, 1 pow(10,(0::float )/400)))) post_ra,
round(1000::float k * (1-d.sa - div0(1, 1 pow(10,(0::float )/400)))) post_rb
from data as d
where race_ind = 1
UNION ALL
select
d.player_id,
d.race_ind,
d.sa,
round(r.post_ra k::float * (d.sa-div0(1, 1 pow(10,(r.post_rb-r.post_ra)/400)))) as post_ra,
round(r.post_rb k::float * (1-d.sa-div0(1, 1 pow(10,(r.post_ra-r.post_rb)/400)))) as post_rb
from rec_cte as r
join data as d
where r.player_id = d.player_id and r.race_ind 1 = d.race_ind
)
select *
from rec_cte
gives:
PLAYER_ID | RACE_IND | SA | POST_RA | POST_RB |
---|---|---|---|---|
100 | 1 | 1 | 1,012 | 988 |
100 | 2 | 1 | 1,023 | 977 |
100 | 3 | 0 | 1,009 | 991 |
100 | 4 | 1 | 1,020 | 980 |