Home > Enterprise >  multiple top n aggregates query defined as a view (or function)?
multiple top n aggregates query defined as a view (or function)?

Time:08-03

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:

customer_id  date      a    b   c   d 
0001         07/01/22   0   3   3   5
0001         07/12/22  12   0  50   0
0002         06/30/22   5  65   0  30
0002         07/20/22   1   0  19   2
0003         08/01/22   0   0  99   0

I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:

column_ranked customer_id  sum rank
'a'           001           12    1
'a'           002            6    2
'b'           002           65    1
'b            001            3    2
'c'           003           99    1
'c'           001           53    2
'd'           002           30    1
'd'           001            5    2

(this assumes no date range filter)

My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:

SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )

How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)

CodePudding user response:

Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:

with 
p as (select cast ('2022-01-01' as date) as ds,  cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc

The date range is in the second line.

See db<>fiddle.

Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

  • Related