Home > front end >  Redshift SQL - get top x% of values
Redshift SQL - get top x% of values

Time:11-21

I'm trying to extrace data on the top 80% of a certain value. Say my dataset is

Restaurant Pizza sold
R1 P1 5
R1 P2 2
R1 P3 4
R1 P4 7
R2 P5 6
R2 P6 9
R2 P7 3
R2 P8 9

I have

select
restaurant
   ,pizza
   ,sum(sold) over (partition by restaurant order by sold desc rows unbounded preceding)::FLOAT / sum(sold) over (partition by restaurant) as running_total_sold_perc

from table

which gives me the share of pizzas sold per restaurant sorted descending. Now though about just adding

where running_total_sold_perc <= 0.8

However, I have the problem that some of my restaurant may sell only a small number of different pizzas, so maybe two pizzas together make up 70% of all sales (38% 32%) and one pizza makes up 30%. With <= 0.8 I would therefore only get two pizzas, covering 70% of sales, not 80%. In my mind, instead of <=80% I need <="the smallest number that is larger than 0.8".

Can any one give me an idea how to implement this? Do I need to create another temp-table for the lowest number above 0.8 per restaurant or is there a simpler way?

CodePudding user response:

You may use the previous value of the running_total_sold_perc in your condition instead of the current running_total_sold_perc value. The previous value of the running_total_sold_perc can be calculated as the following:

pre_running_total_sold_perc = (sold running sum - sold) / total sold sum.

So, your query will be:

select restaurant, pizza, running_total_sold_perc    
from
(
 select restaurant
       ,pizza
       ,sum(sold) over (partition by restaurant order by sold desc)*1.0 /
        sum(sold) over (partition by restaurant) as running_total_sold_perc
       ,(sum(sold) over (partition by restaurant order by sold desc) - sold)*1.0 /
        sum(sold) over (partition by restaurant) as pre_running_total_sold_perc
 from table1
) T
where pre_running_total_sold_perc<0.8

Note that the condition should be pre_running_total_sold_perc<0.8 not pre_running_total_sold_perc<=0.8, this will guarantee the selection of all rows that their running_total_sold_perc<0.8 including the first row with value >=0.8.

  • Related