How can I aggregate columns, then find out the biggest changes over time and display the most outsta-CodePudding

I'm a beginner in SQL and want to track the biggest changes in sales of tickets over time in absolute manner (so positive and negative) for a set of states (with a number of venders who can sell in every state they want) where the tickets are sold.

So I have a state_table like this:

state_key	state_name
1	Alabama
2	Connecticut
3	Maine

I have a ticket_sales table like this:

vender_key	state_key	sell_date	no_of_sales
1	1	2021-01-01	27
1	2	2021-01-01	48
1	3	2021-01-01	69
2	1	2021-01-01	31
2	2	2021-01-01	41
3	2	2021-01-01	10
3	3	2021-01-01	15
1	1	2021-02-01	11
1	2	2021-02-01	21
1	3	2021-02-01	20
2	1	2021-02-01	67
2	2	2021-02-01	39
3	2	2021-02-01	13
3	3	2021-02-01	19
1	1	2021-03-01	77
1	2	2021-03-01	68
1	3	2021-03-01	59
2	1	2021-03-01	10
2	2	2021-03-01	11
3	2	2021-03-01	49
3	3	2021-03-01	54

Of course, the tables are much longer in real.

Now I want to know which states have the largest absolute difference in total ticket sales within the whole time frame. I want to display

the name of the state
the maximum date
the number of sales at the maximum date in the respective state
the minimum date
the number of sales at the minimum date in the respective state and
the difference between the maximum sales and the minimum sales for the top 3 states;

sorted descending by the difference between max and min sales per state.

My goal is to see which are the states with large changes in ticket sales over time, regardless if there is a positive or a negative development of sales.

So therefore, I need to sum up the venders' sales per state for each date first, then to pick the highest number of sales for each state and the corresponding date for each state. Then I have to do the same for the lowest number of sales and the corresponding date for each state. And at last, I need to calculate the absolute difference between these two numbers.

For me, this is a bit too much aggregation to handle in SQL, unfortunately. I have tried some code with SUM and GROUP BY states, but I don't know how to tell SQL the aggregation and then calculation correctly by choosing highest numbers per state (so the combination of MAX respectively MIN and GROUP BY), I'm confused by that.

My expected result should be:

state_name	date_max	sales_max	date_min	sales_min	difference
Maine	2021-03-01	113	2021-02-01	39	74
Connecticut	2021-03-01	128	2021-02-01	73	55
Alabama	2021-03-01	87	2021-01-01	58	29

Thank you in advance!

CodePudding user response：

You can do it with two aggregations. First find dates with max/min sum of no_of_sales for every state_key next aggregate it by state_key conditionally.

select state_key 
  , max(case nmin when 1 then sell_date end) date_min
  , max(case nmin when 1 then n end) sales_min
  , max(case nmax when 1 then sell_date end) date_max
  , max(case nmax when 1 then n end) sales_max
  , max(case nmax when 1 then n end) - max(case nmin when 1 then n end ) delta
from ( 
    select state_key, sell_date, sum(no_of_sales) n
       , row_number() over(partition by state_key order by sum(no_of_sales)) nmin
       , row_number() over(partition by state_key order by sum(no_of_sales) desc) nmax
    from ticket_sales
    group by state_key, sell_date
) t
where nmin = 1 or nmax = 1
group by state_key