I'm having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:
Region | Year | Difference | ||
---|---|---|---|---|
central | 01 | null | ||
central | 02 | 10 | ||
central | 03 | -9 | ||
east | 01 | -29 | ||
east | 02 | 10 | ||
east | 03 | -9 | central | -29 |
west | 02 | 10 | ||
west | 03 | -9 | central | -29 |
west | 02 | 10 | ||
north | 03 | -9 | central | -29 |
north | 02 | 10 | ||
north | 03 | -9 |
What I currently did to find the difference was to utilise the lag function in sql.
The current problems I'm facing are:
- The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
- After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?
Any help would be appreciated, I'm racking my brain at this query. Thank you!
CodePudding user response:
You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.
SELECT Region
, SUM(CASE WHEN IFNULL(Difference,0) < 0 THEN 1 ELSE 0 END) CountDifference
FROM yourTable
GROUP BY Region
ORDER BY CountDifference DESC
LIMIT 3
Also here: https://dbfiddle.uk/83qE9RX2
CodePudding user response:
Managed to solve it after some thinking with a nested select
select region, count() from ("nested statement here") f where difference < 0 group by region order by count() desc limit 3
CodePudding user response:
First we fix the difference column by making every first year value null
. Then we group by region
and count
all the case where difference < 0
. After that we use dense_rank()
to find all the regions with the top count results (including ties).
select Region
,negative_differece_count
from (
select *
,dense_rank() over(order by negative_differece_count desc) as dns_rnk
from (
select Region
,count(case when (case when Year = 1 then null else Difference end) < 0 then 1 end) as negative_differece_count
from t
group by Region
) t
) t
where dns_rnk <= 3
Region | negative_differece_count |
---|---|
north | 2 |
central | 1 |
east | 1 |
west | 1 |