Calculating difference between consecutive rows, then finding top 3 IDs with highest count of negati-CodePudding

I'm having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:

Region	Year	Difference
central	01	null
central	02	10
central	03	-9
east	01	-29
east	02	10
east	03	-9	central	-29
west	02	10
west	03	-9	central	-29
west	02	10
north	03	-9	central	-29
north	02	10
north	03	-9

What I currently did to find the difference was to utilise the lag function in sql.

The current problems I'm facing are:

The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?

Any help would be appreciated, I'm racking my brain at this query. Thank you!

CodePudding user response：

You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.

SELECT Region
, SUM(CASE WHEN IFNULL(Difference,0) < 0 THEN 1 ELSE 0 END) CountDifference
FROM yourTable
GROUP BY Region
ORDER BY CountDifference DESC
LIMIT 3

Also here: https://dbfiddle.uk/83qE9RX2

CodePudding user response：

Managed to solve it after some thinking with a nested select

select region, count() from ("nested statement here") f where difference < 0 group by region order by count() desc limit 3

CodePudding user response：

First we fix the difference column by making every first year value null. Then we group by region and count all the case where difference < 0. After that we use dense_rank() to find all the regions with the top count results (including ties).

select   Region
        ,negative_differece_count
from     (
         select   *
                  ,dense_rank() over(order by negative_differece_count desc) as dns_rnk
         from     (  
                  select   Region
                          ,count(case when (case when Year = 1 then null else Difference end) < 0 then 1 end) as negative_differece_count
                  from     t
                  group by Region
                  ) t
         ) t
where    dns_rnk <= 3

Region	negative_differece_count
north	2
central	1
east	1
west	1

Fiddle