Home > Net >  Calculating difference between consecutive rows, then finding top 3 IDs with highest count of negati
Calculating difference between consecutive rows, then finding top 3 IDs with highest count of negati

Time:10-12

I'm having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:

Region Year Difference
central 01 null
central 02 10
central 03 -9
east 01 -29
east 02 10
east 03 -9 central -29
west 02 10
west 03 -9 central -29
west 02 10
north 03 -9 central -29
north 02 10
north 03 -9

What I currently did to find the difference was to utilise the lag function in sql.

The current problems I'm facing are:

  • The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
  • After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?

Any help would be appreciated, I'm racking my brain at this query. Thank you!

CodePudding user response:

You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.

SELECT Region
, SUM(CASE WHEN IFNULL(Difference,0) < 0 THEN 1 ELSE 0 END) CountDifference
FROM yourTable
GROUP BY Region
ORDER BY CountDifference DESC
LIMIT 3

Also here: https://dbfiddle.uk/83qE9RX2

CodePudding user response:

Managed to solve it after some thinking with a nested select

select region, count() from ("nested statement here") f where difference < 0 group by region order by count() desc limit 3

CodePudding user response:

First we fix the difference column by making every first year value null. Then we group by region and count all the case where difference < 0. After that we use dense_rank() to find all the regions with the top count results (including ties).

select   Region
        ,negative_differece_count
from     (
         select   *
                  ,dense_rank() over(order by negative_differece_count desc) as dns_rnk
         from     (  
                  select   Region
                          ,count(case when (case when Year = 1 then null else Difference end) < 0 then 1 end) as negative_differece_count
                  from     t
                  group by Region
                  ) t
         ) t
where    dns_rnk <= 3
Region negative_differece_count
north 2
central 1
east 1
west 1

Fiddle

  • Related