I'm a beginner in SQL and want to track the biggest changes in sales of tickets over time in absolute manner (so positive and negative) for a set of states (with a number of venders who can sell in every state they want) where the tickets are sold.
So I have a state_table
like this:
state_key | state_name |
---|---|
1 | Alabama |
2 | Connecticut |
3 | Maine |
I have a ticket_sales
table like this:
vender_key | state_key | sell_date | no_of_sales |
---|---|---|---|
1 | 1 | 2021-01-01 | 27 |
1 | 2 | 2021-01-01 | 48 |
1 | 3 | 2021-01-01 | 69 |
2 | 1 | 2021-01-01 | 31 |
2 | 2 | 2021-01-01 | 41 |
3 | 2 | 2021-01-01 | 10 |
3 | 3 | 2021-01-01 | 15 |
1 | 1 | 2021-02-01 | 11 |
1 | 2 | 2021-02-01 | 21 |
1 | 3 | 2021-02-01 | 20 |
2 | 1 | 2021-02-01 | 67 |
2 | 2 | 2021-02-01 | 39 |
3 | 2 | 2021-02-01 | 13 |
3 | 3 | 2021-02-01 | 19 |
1 | 1 | 2021-03-01 | 77 |
1 | 2 | 2021-03-01 | 68 |
1 | 3 | 2021-03-01 | 59 |
2 | 1 | 2021-03-01 | 10 |
2 | 2 | 2021-03-01 | 11 |
3 | 2 | 2021-03-01 | 49 |
3 | 3 | 2021-03-01 | 54 |
Of course, the tables are much longer in real.
Now I want to know which states have the largest absolute difference in total ticket sales within the whole time frame. I want to display
- the name of the state
- the maximum date
- the number of sales at the maximum date in the respective state
- the minimum date
- the number of sales at the minimum date in the respective state and
- the difference between the maximum sales and the minimum sales for the top 3 states;
sorted descending by the difference between max and min sales per state.
My goal is to see which are the states with large changes in ticket sales over time, regardless if there is a positive or a negative development of sales.
So therefore, I need to sum up the venders' sales per state for each date first, then to pick the highest number of sales for each state and the corresponding date for each state. Then I have to do the same for the lowest number of sales and the corresponding date for each state. And at last, I need to calculate the absolute difference between these two numbers.
For me, this is a bit too much aggregation to handle in SQL, unfortunately. I have tried some code with SUM
and GROUP BY
states, but I don't know how to tell SQL the aggregation and then calculation correctly by choosing highest numbers per state (so the combination of MAX
respectively MIN
and GROUP BY
), I'm confused by that.
My expected result should be:
state_name | date_max | sales_max | date_min | sales_min | difference |
---|---|---|---|---|---|
Maine | 2021-03-01 | 113 | 2021-02-01 | 39 | 74 |
Connecticut | 2021-03-01 | 128 | 2021-02-01 | 73 | 55 |
Alabama | 2021-03-01 | 87 | 2021-01-01 | 58 | 29 |
Thank you in advance!
CodePudding user response:
You can do it with two aggregations. First find dates with max/min sum of no_of_sales
for every state_key
next aggregate it by state_key
conditionally.
select state_key
, max(case nmin when 1 then sell_date end) date_min
, max(case nmin when 1 then n end) sales_min
, max(case nmax when 1 then sell_date end) date_max
, max(case nmax when 1 then n end) sales_max
, max(case nmax when 1 then n end) - max(case nmin when 1 then n end ) delta
from (
select state_key, sell_date, sum(no_of_sales) n
, row_number() over(partition by state_key order by sum(no_of_sales)) nmin
, row_number() over(partition by state_key order by sum(no_of_sales) desc) nmax
from ticket_sales
group by state_key, sell_date
) t
where nmin = 1 or nmax = 1
group by state_key