I'm trying to extrace data on the top 80% of a certain value. Say my dataset is
Restaurant | Pizza | sold |
---|---|---|
R1 | P1 | 5 |
R1 | P2 | 2 |
R1 | P3 | 4 |
R1 | P4 | 7 |
R2 | P5 | 6 |
R2 | P6 | 9 |
R2 | P7 | 3 |
R2 | P8 | 9 |
I have
select
restaurant
,pizza
,sum(sold) over (partition by restaurant order by sold desc rows unbounded preceding)::FLOAT / sum(sold) over (partition by restaurant) as running_total_sold_perc
from table
which gives me the share of pizzas sold per restaurant sorted descending. Now though about just adding
where running_total_sold_perc <= 0.8
However, I have the problem that some of my restaurant may sell only a small number of different pizzas, so maybe two pizzas together make up 70% of all sales (38% 32%) and one pizza makes up 30%. With <= 0.8 I would therefore only get two pizzas, covering 70% of sales, not 80%. In my mind, instead of <=80% I need <="the smallest number that is larger than 0.8".
Can any one give me an idea how to implement this? Do I need to create another temp-table for the lowest number above 0.8 per restaurant or is there a simpler way?
CodePudding user response:
You may use the previous value of the running_total_sold_perc in your condition instead of the current running_total_sold_perc value. The previous value of the running_total_sold_perc can be calculated as the following:
pre_running_total_sold_perc = (sold running sum - sold) / total sold sum.
So, your query will be:
select restaurant, pizza, running_total_sold_perc
from
(
select restaurant
,pizza
,sum(sold) over (partition by restaurant order by sold desc)*1.0 /
sum(sold) over (partition by restaurant) as running_total_sold_perc
,(sum(sold) over (partition by restaurant order by sold desc) - sold)*1.0 /
sum(sold) over (partition by restaurant) as pre_running_total_sold_perc
from table1
) T
where pre_running_total_sold_perc<0.8
Note that the condition should be pre_running_total_sold_perc<0.8
not pre_running_total_sold_perc<=0.8
, this will guarantee the selection of all rows that their running_total_sold_perc<0.8 including the first row with value >=0.8.