I would like a Postgres query returning the number of rows per percentile.
Input:
id | name | price |
---|---|---|
1 | apple | 12 |
2 | banana | 6 |
3 | orange | 18 |
4 | pineapple | 26 |
4 | lemon | 30 |
Desired output:
percentile_3_1 | percentile_3_2 | percentile_3_3 |
---|---|---|
1 | 2 | 2 |
percentile_3_1
= number of fruits in the 1st 3-precentile
(i.e. with a price < 10)
CodePudding user response:
Postgres has the window function ntile()
and a number of very useful ordered-set aggregate functions for percentiles. But you seem to have the wrong term.
number of fruits in the 1st 3-precentile (i.e. with a price < 10)
That's not a "percentile". That's the count of rows with a price below a third of the maximum.
Assuming price
is defined numeric NOT NULL CHECK (price > 0)
, here is a generalized query to get row counts for any given number of partitions:
WITH bounds AS (
SELECT *
FROM (
SELECT bound AS lo, lead(bound) OVER (ORDER BY bound) AS hi
FROM (
SELECT generate_series(0, x, x/3) AS bound -- number of partitions here!
FROM (SELECT max(price) AS x FROM tbl) x
) sub1
) sub2
WHERE hi IS NOT NULL
)
SELECT b.hi, count(t.price)
FROM bounds b
LEFT JOIN tbl t ON t.price > b.lo AND t.price <= b.hi
GROUP BY 1
ORDER BY 1;
Result:
hi | count
-------------------- ------
10.0000000000000000 | 1
20.0000000000000000 | 2
30.0000000000000000 | 2
Notably, each partition includes the upper bound, as this makes more sense while deriving partitions from the maximum value. So your quote would read:
i.e. with a price <= 10
db<>fiddle here