Home > Software engineering >  Calculating an average by range in jumps of a certain quantity with conditions in SQL
Calculating an average by range in jumps of a certain quantity with conditions in SQL

Time:02-28

I have a table with the serial number of each product, the number of years it has been marketed in the store and the amount of revenue from the product since it has been in the store. I would like to write a query that will calculate the average revenue over a 5 year period based on the amount of his years in the store. In addition I want the calculation to be done only when there are over 3 products in the range of those years.

The sample table:

Product_Number Years_in_the_store Revenue_from_the_product
1 0.16 8690
2 0.36 57661
3 0.85 29
4 1 12280
5 1.4 3318
6 1.45 20686
7 1.79 8905

When in practice the table contains 40 rows. The only solution I could think of is to create manual ranges (i.e. between 5 years and 10 and between 10 years and 15 years) but this can be a problem when I have a larger amount of years. Example of my solution:

SELECT AVG(CASE
               WHEN Years_in_the_store BETWEEN 0 AND 5 THEN Revenue_from_the_product
           END) AS Between_0_to_5,
       AVG(CASE
               WHEN Years_in_the_store BETWEEN 5 AND 10 THEN Revenue_from_the_product
           END) AS Between_5_to_10,
       AVG(CASE
               WHEN Years_in_the_store BETWEEN 10 AND 15 THEN Revenue_from_the_product
           END) AS Between_10_to_15,
       AVG(CASE
               WHEN Years_in_the_store BETWEEN 15 AND 20 THEN Revenue_from_the_product
           END) AS Between_15_to_20,
       AVG(CASE
               WHEN Years_in_the_store BETWEEN 20 AND 25 THEN Revenue_from_the_product
           END) AS Between_20_to_25,
       AVG(CASE
               WHEN Years_in_the_store BETWEEN 25 AND 25 THEN Revenue_from_the_product
           END) AS Between_25_to_30
FROM EXDATA

I would be happy for professional help :)

CodePudding user response:

Below query will return every five yearly average revenue. year_range 1 indicates 0 to 5 years, 2 indicates 10 to 15 years, 3 indicates 16 to 20 years and so on. It's dynamic. You need not to worry about how many rows are there or about the year ranges.

having count(*)>2 will ensure that there are at least three rows in a particular year_range for the average revenue to be calculated. Otherwise that year_range will be ignored.

Schema and insert statements:

 create table EXDATA(Product_Number int, Years_in_the_store float, Revenue_from_the_product int);
 insert into EXDATA values(1    ,11.16, 8690);
 insert into EXDATA values(2    ,9.36,  57661);
 insert into EXDATA values(3    ,4.85,  29);
 insert into EXDATA values(4    ,1, 12280);
 insert into EXDATA values(5    ,16.8,  3318);
 insert into EXDATA values(6    ,7.45,  20686);
 insert into EXDATA values(7    ,1.79,  8905);

Query:

 select floor((CEILING(Years_in_the_store)-1)/5 1)year_range,avg(Revenue_from_the_product) five_yearly_average
 from EXDATA
 group by floor((CEILING(Years_in_the_store)-1)/5 1)
 having count(*)>2
 order by year_range

Output:

year_range five_yearly_average
1 7071.3333
2 39173.5000
3 8690.0000
4 3318.0000

db<>fiddle here

CodePudding user response:

In order to see in which range a poroduct falls you can integer-divide the years by 5. Group by the thus found ranges and add up their revenue and (fractional) years. Then divide revenue by years and multiply by five years to get an average over the full five year time range.

Use HAVING to only consioder ranges with at least three products.

select
  years_in_the_store div 5 * 5 as five_year_range_start,
  sum(revenue_from_the_product) / sum(years_in_the_store) * 5 as avg_revenue
from mytable
group by years_in_the_store div 5 * 5
having   count(*) >= 3;
order by five_year_range_start;

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=6db31bce5408126b177000255a3a1b2d

  • Related