Number of rows per "percentile"-CodePudding

I would like a Postgres query returning the number of rows per percentile.

Input:

id	name	price
1	apple	12
2	banana	6
3	orange	18
4	pineapple	26
4	lemon	30

Desired output:

percentile_3_1	percentile_3_2	percentile_3_3
1	2	2

percentile_3_1 = number of fruits in the 1st 3-precentile (i.e. with a price < 10)

CodePudding user response：

Postgres has the window function ntile() and a number of very useful ordered-set aggregate functions for percentiles. But you seem to have the wrong term.

number of fruits in the 1st 3-precentile (i.e. with a price < 10)

That's not a "percentile". That's the count of rows with a price below a third of the maximum.

Assuming price is defined numeric NOT NULL CHECK (price > 0), here is a generalized query to get row counts for any given number of partitions:

WITH bounds AS (
   SELECT *
   FROM  (
      SELECT bound AS lo, lead(bound) OVER (ORDER BY bound) AS hi
      FROM  (
         SELECT generate_series(0, x, x/3) AS bound  -- number of partitions here!
         FROM  (SELECT max(price) AS x FROM tbl) x
         ) sub1
      ) sub2
   WHERE  hi IS NOT NULL
   )
SELECT b.hi, count(t.price)
FROM   bounds   b
LEFT   JOIN tbl t ON t.price > b.lo AND t.price <= b.hi
GROUP  BY 1
ORDER  BY 1;

Result:

                  hi | count
 -------------------- ------
 10.0000000000000000 |     1
 20.0000000000000000 |     2
 30.0000000000000000 |     2

Notably, each partition includes the upper bound, as this makes more sense while deriving partitions from the maximum value. So your quote would read:

i.e. with a price <= 10

db<>fiddle here