Home > Blockchain >  How do I find the Nth highest value monthly for a daily series in a table using SQL?
How do I find the Nth highest value monthly for a daily series in a table using SQL?

Time:09-04

Problem: Find the total number of incidents per borough per month, order by number of incidents descending, and return the Nth ranked boroughs. I.e., the boroughs with the 3rd, 5th, 7th, etc., most incidents each month.

I'm able to get the number of incidents per borough per month without any issue.

SELECT borough_name, FORMAT_DATE("%Y-%m", date_of_call) Month_of_Incident,
COUNT(incident_number) Number_of_Incidents,
FROM `bigquery-public-data.london_fire_brigade.fire_brigade_service_calls` 
GROUP BY Month_of_Incident, borough_name
ORDER BY Month_of_Incident, Number_of_Incidents DESC

which gives:

Row     borough_name    Month_of_Incident   Number_of_Incidents 
1       WESTMINSTER     2017-01             620
    
2       CAMDEN          2017-01             401
    
3       SOUTHWARK       2017-01             389
    
4       LAMBETH         2017-01             377
    
5       TOWER HAMLETS   2017-01             334

I attempted using Row_Number as in this method for selective top-N by date but get:

Window ORDER BY expression references column date_of_call which is neither grouped nor aggregated

when I try to place date_of_call in the window function. Am I not able to do this because I'm using COUNT to get the number of incidents rather than being able to reference a column with an int or float directly that the window function can aggregate?

I want to be able to adjust an outer row_number/rank_number = X, Y, Z or row_number/rank_number >= some value and date selection to be able to pull out the Nth highest or ordered range of entries over a weekly, monthly, yearly range. I.e., WHERE rn = 3, WHERE rn <= 5.

When finished the output for the 3rd highest, for instance, should be:

Row      Borough_Name     Month_of_Incident     Number_of_Incidents
1        SOUTHWARK        2017-01               389
2        TOWER HAMLETS    2017-02               345
3        LAMBETH          2017-03               348
    

Thanks in advance for any assistance.

CodePudding user response:

Try to add another parameter to your query. LIMIT = 5 (where 5 could be replaced by your "top-N" value).

That should limit the number to the top "N" values for the interval you set with your other fields.

CodePudding user response:

Ended up figuring it out. I believe the issue is that I was trying to carry out partitioning by date_of_call in its formatted form in the same query where I created the formatting. Once I separated that from the windowing I was able to get the desired result.

SELECT *
FROM
(
  SELECT DENSE_RANK() OVER(PARTITION BY Month_of_Incident ORDER BY Month_of_Incident, Number_of_Incidents DESC) rank_no,
  borough_name, Month_of_Incident, Number_of_Incidents
  FROM
  (
    SELECT borough_name, FORMAT_DATE("%Y-%m", date_of_call) Month_of_Incident,
    COUNT(incident_number) Number_of_Incidents
    FROM `bigquery-public-data.london_fire_brigade.fire_brigade_service_calls` 
    GROUP BY Month_of_Incident, borough_name
    ORDER BY Month_of_Incident, Number_of_Incidents DESC
  )
  ORDER BY Month_of_Incident, Number_of_Incidents DESC
) A
WHERE A.rank_no = 2

Which gives:

Row     rank_no borough_name    Month_of_Incident   Number_of_Incidents 
1       2       CAMDEN          2017-01             401
2       2       CAMDEN          2017-02             348
3       2       SOUTHWARK       2017-03             372
4       2       CAMDEN          2017-04             412

Just for clarity, I used FORMAT rather than DATEPART so that it's still applicable for data that spans multiple years. DATEPART would aggregate all months together regardless of year.

The WHERE clause can be changed to WHERE A.rank_no <= 3, WHERE A.rank_no BETWEEN 3 AND 5, etc., to get the top N, etc., per month:

Row     rank_no borough_name    Month_of_Incident   Number_of_Incidents 
1       1       WESTMINSTER     2017-01             620
2       2       CAMDEN          2017-01             401
3       3       SOUTHWARK       2017-01             389
4       1       WESTMINSTER     2017-02             587
5       2       CAMDEN          2017-02             348
6       3       TOWER HAMLETS   2017-02             345

Not what I was initially after, but turns out to be a nice bonus of doing it this way.

  • Related