Grouping by multiple columns in a range in postgres-CodePudding

I have a couple tables (pega.race exists too we just don't need any of the data for this)


CREATE TABLE pega.pega_race (
    id bigserial NOT NULL,
    race_id int8 NOT NULL,
    pega_id int8 NOT NULL,
    "result" int4 NOT NULL,
    ts timestamp NULL DEFAULT now_utc(),
    CONSTRAINT pega_race_pk PRIMARY KEY (id)
)

CREATE TABLE pega.pega (
    id int8 NOT NULL,
    speed numeric NULL,
    strength numeric NULL,
    lightning numeric NULL,
    wind numeric NULL,
    water numeric NULL,
    fire numeric NULL,
    CONSTRAINT pega_pk PRIMARY KEY (id)
);

What I want is to get an average result for each combination (permutation?) of speed/strength/wind/lightning/water/fire in a range. So from 0-2.25, 2.25-4.5, 4.5-6.75, and 6.75-9. They have to be within 0-9. No values exist outside that.

So I want an avg result for 6.75-9 speed, 2.25-4.5 strength, 4.5-6.75 wind, 2.25-4.5 lightning, 0-2.25 water, 0-2.25 fire and every other combination there is. I don't have enough data to do it simply by rounding like so

SELECT round(speed) speed, round(pega.strength) strength, round(pega.lightning) lightning, round(pega.wind) wind, round(pega.water) water, round(pega.fire) fire, 
    avg(result), 
    count(*) 
FROM pega.pega
JOIN pega.pega_race pr ON pega.id=pr.pega_id
GROUP BY 1, 2, 3, 4, 5, 6
HAVING count(*) > 20
ORDER BY avg(result)

So I want to extend the ranges out on these to bring the number of possibilities down by a few factors.

Looking at other posts similar to this I found solutions to group by ranges on a single column and I'm struggling to extend that to numerous columns.

Here's an attempt I made

with series as (
    SELECT generate_series(0, 9-2.25, 2.25) as r_from
), range as (
    SELECT r_from, r_from   2.25 as r_to FROM series
), pega_data as (
    SELECT speed, strength, lightning, wind, water, pega.fire, result
    FROM pega.pega
    JOIN pega.pega_race pr ON pega.id=pr.pega_id
    JOIN pega.race ON pr.race_id=race.id
)
SELECT r_from, r_to, 
    (SELECT count(*) speed_count FROM pega_data WHERE speed between r_from and r_to),
    (SELECT count(*) strength_count FROM pega_data WHERE strength between r_from and r_to),
    (SELECT count(*) lightning_count FROM pega_data WHERE lightning between r_from and r_to),
    (SELECT count(*) wind_count FROM pega_data WHERE wind between r_from and r_to),
    (SELECT count(*) water_count FROM pega_data WHERE water between r_from and r_to),
    (SELECT count(*) fire_count FROM pega_data WHERE fire between r_from and r_to),
    (SELECT AVG(result) FROM pega_data  WHERE speed between r_from and r_to AND strength between r_from and r_to AND lightning between r_from and r_to AND wind between r_from and r_to AND water between r_from and r_to AND fire between r_from and r_to)
FROM range

I think I'm almost there with this? But not sure how to break everything out into every combo.

Thanks.

Also, bonus points if you can lead me down the path to get percent 1st place. (result == 1). Extra extra bonus points if you can lead me down a more statistical route that pulls out correlations between numerous columns and a lower result/higher 1st %. This one I have no clue how to even begin.. Last (and only) stats class was 5 years ago now..

CodePudding user response：

You needn't to generate the series explicitly and check ranges due to regular data structure. Try using arithm opers to get the statistics by the range index 0..3. For example

select floor(speed/2.25)::int r_speed, floor(strength/2.25)::int r_strength, floor(lightning/2.25)::int r_lightning, floor(wind/2.25)::int r_wind, count(*) n
from pega_data
group by r_speed, r_strength, r_lightning, r_wind
order by r_speed, r_strength, r_lightning, r_wind