Home > Software design >  Quartiles calculation in Postgresql Query
Quartiles calculation in Postgresql Query

Time:09-29

I am having a hard time trying to get this done. I have the following table:

cod_prod   seller   price    date
   A       Andres     10    anydate
   A       Paul       5     anydate
   A       Mike      2.5    anydate
   A       Josh      1.75   anydate
   A       Karen     7.5    anydate
 ....      .....     ...    .......

I am trying to calculate quartiles of the price for each product and classify each seller's price into 4 quartiles.

The output I am expecting is:

Cod_Prod   Seller  Price  Quartile  1stQ  2ndQ   3rdQ  4thQ
   A      Andres    10       4       2.5   5     7.5    10
   A      Karen    7.5       3       2.5   5     7.5    10
   A      Paul      5        2       2.5   5     7.5    10
   A      Mike     2.5       1       2.5   5     7.5    10
   A      Josh     1.75      1       2.5   5     7.5    10
  ..     .....      ....   ....     ....  ..     ...    ...

This table has thousands of distinct cod_prod and thousands of sellers.

I am trying this query:

with cte as ( 
            
            select seller, cod_prod, sum(price) as sum_price
            from tablename
            group by 2,1

)

select seller, 

      cod_prod, 
      sum_price,
      ntile(4) over (partition by seller order by sum_price asc) quartile
      from cte
    

But this not doing what I expect and still mising the 1stQ to 4thQ indicators bins

I tried many different things but this is the closest I got from what I want. Can someone help me to solve it?

CodePudding user response:

I am not sure if this query is exactly what you want, but I think can help you.

I calculated quartiles grouping by cod_prod.

WITH cte AS (SELECT seller, cod_prod, sum(price) as sum_price
             FROM t
             GROUP BY seller, cod_prod),
             
     quartiles AS (SELECT 
                         cod_prod,
                         percentile_cont(0.25) within group (order by sum_price asc) as "1stQ",
                         percentile_cont(0.50) within group (order by sum_price asc) as "2ndQ",
                         percentile_cont(0.75) within group (order by sum_price asc) as "3rdQ",
                         percentile_cont(1) within group (order by sum_price asc) as "4thQ"
                   FROM cte
                   GROUP BY cod_prod)

SELECT cte.*,
       ntile(4) over (PARTITION BY cte.cod_prod ORDER BY sum_price ASC) quartile,
       quartiles.*
FROM cte
INNER JOIN quartiles ON cte.cod_prod = quartiles.cod_prod;
  • Related