I have a requirement to calculate the second max value in the moving window of the past 120 minutes frame just like I have calculated max value in past 120 minutes moving frame partitioned on device_id.
MAX(Temperature) OVER(PARTITION BY device_type ORDER BY event_ts_seconds RANGE BETWEEN 7200 PRECEDING AND CURRENT ROW) AS Max_Temp_in_120frame
Similarly, I want to calculate the second max temperature.
I have tried the below 2 versions of NTH_VALUE but it is not working as expected.
NTH_VALUE(Temperature,2) OVER(PARTITION BY device_id ORDER BY event_ts_seconds RANGE BETWEEN 7200 PRECEDING AND CURRENT ROW) AS SecondMax_Temp_in_120frame
NTH_VALUE(Temperature,2) OVER(PARTITION BY device_id ORDER BY Temperature DESC RANGE BETWEEN 7200 PRECEDING AND CURRENT ROW) AS SecondMax_Temp_in_120frame
Help on this will be much appreciated
CodePudding user response:
First of all, NTH_VALUE
won't work because it is a generalization of the window function FIRST_VALUE
, not MAX
.
In BigQuery, you can compute the second max value in three steps.
First, use ARRAY_AGG
to compute the moving array of values:
ARRAY_AGG(value) OVER(ORDER BY ts RANGE BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_array
This will give you something like this:
ts | value | moving_array_as_string |
---|---|---|
1 | 0.6 | [0.6] |
2 | 5.4 | [0.6, 5.4] |
3 | 5.4 | [0.6, 5.4, 5.4] |
4 | 5.9 | [0.6, 5.4, 5.4, 5.9] |
5 | 4.6 | [5.4, 5.4, 5.9, 4.6] |
6 | 6 | [5.4, 5.9, 4.6, 6] |
7 | 8.7 | [5.9, 4.6, 6, 8.7] |
8 | 3.5 | [4.6, 6, 8.7, 3.5] |
9 | 4 | [6, 8.7, 3.5, 4] |
10 | 0.7 | [8.7, 3.5, 4, 0.7] |
Once you have that, you can use ROW_NUMBER
inside the array with an UNNEST
to rank the values, like this:
ARRAY(SELECT AS STRUCT value, row_number() OVER(ORDER BY value DESC) as rank FROM UNNEST(moving_array) as value ORDER BY value DESC) as ranked_moving_array
This will give you something like this (I did not display the rank, but you can see the values have been sorted inside each array):
ts | value | ranked_moving_array_as_string |
---|---|---|
1 | 0.6 | [0.6] |
2 | 5.4 | [5.4, 0.6] |
3 | 5.4 | [5.4, 5.4, 0.6] |
4 | 5.9 | [5.9, 5.4, 5.4, 0.6] |
5 | 4.6 | [5.9, 5.4, 5.4, 4.6] |
6 | 6 | [6, 5.9, 5.4, 4.6] |
7 | 8.7 | [8.7, 6, 5.9, 4.6] |
8 | 3.5 | [8.7, 6, 4.6, 3.5] |
9 | 4 | [8.7, 6, 4, 3.5] |
10 | 0.7 | [8.7, 4, 3.5, 0.7] |
Finally, you can reuse this ranked_array with another UNNEST
to pick the second max value (or any n-th value), like this:
(SELECT value FROM UNNEST(ranked_moving_array) WHERE rank=2) second_max_value_in_3_frame
The final query looks like this:
WITH sample AS (
SELECT
ts,
ROUND(10*RAND(), 1) as value
FROM UNNEST (GENERATE_ARRAY(1, 10)) ts
)
, sample_with_moving_array AS (
SELECT
ts,
value,
ARRAY_AGG(value) OVER(ORDER BY ts RANGE BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_array,
FROM sample
)
, sample_with_ranked_moving_array AS (
SELECT
ts,
value,
moving_array,
ARRAY(SELECT AS STRUCT value, row_number() OVER(ORDER BY value DESC) as rank FROM UNNEST(moving_array) as value ORDER BY value DESC) as ranked_moving_array
FROM sample_with_moving_array
)
SELECT
ts,
value,
MAX(value) OVER(ORDER BY ts RANGE BETWEEN 3 PRECEDING AND CURRENT ROW) AS max_value_in_3_frame,
(SELECT value FROM UNNEST(ranked_moving_array) WHERE rank=1) max_value_in_3_frame_alternate_version,
(SELECT value FROM UNNEST(ranked_moving_array) WHERE rank=2) second_max_value_in_3_frame,
CONCAT("[", ARRAY_TO_STRING(ARRAY(SELECT CAST(value as STRING) FROM UNNEST(moving_array) value), ", "), "]") as moving_array_as_string,
CONCAT("[", ARRAY_TO_STRING(ARRAY(SELECT CAST(elem.value as STRING) FROM UNNEST(ranked_moving_array) elem), ", "), "]") as ranked_moving_array_as_string
FROM sample_with_ranked_moving_array
Which returns:
ts | value | max_value_in_3_frame | max_value_in_3_frame_alternate_version | second_max_value_in_3_frame | moving_array_as_string | ranked_moving_array_as_string |
---|---|---|---|---|---|---|
1 | 0.6 | 0.6 | 0.6 | null | [0.6] | [0.6] |
2 | 5.4 | 5.4 | 5.4 | 0.6 | [0.6, 5.4] | [5.4, 0.6] |
3 | 5.4 | 5.4 | 5.4 | 5.4 | [0.6, 5.4, 5.4] | [5.4, 5.4, 0.6] |
4 | 5.9 | 5.9 | 5.9 | 5.4 | [0.6, 5.4, 5.4, 5.9] | [5.9, 5.4, 5.4, 0.6] |
5 | 4.6 | 5.9 | 5.9 | 5.4 | [5.4, 5.4, 5.9, 4.6] | [5.9, 5.4, 5.4, 4.6] |
6 | 6 | 6 | 6 | 5.9 | [5.4, 5.9, 4.6, 6] | [6, 5.9, 5.4, 4.6] |
7 | 8.7 | 8.7 | 8.7 | 6 | [5.9, 4.6, 6, 8.7] | [8.7, 6, 5.9, 4.6] |
8 | 3.5 | 8.7 | 8.7 | 6 | [4.6, 6, 8.7, 3.5] | [8.7, 6, 4.6, 3.5] |
9 | 4 | 8.7 | 8.7 | 6 | [6, 8.7, 3.5, 4] | [8.7, 6, 4, 3.5] |
10 | 0.7 | 8.7 | 8.7 | 4 | [8.7, 3.5, 4, 0.7] | [8.7, 4, 3.5, 0.7] |
As a control measure, I also computed the max value with the old and new methods to check I had the same results with both.
To conclude, you are lucky to be on BigQuery, as I don't think other SQL engines support all the capabilities used in this query, at least not with such an elegant syntax.
CodePudding user response:
Consider below approach
SELECT * EXCEPT(Array_Temp_in_120frame, Two_Max_Temp_in_120frame),
Two_Max_Temp_in_120frame[SAFE_OFFSET(0)] Max_Temp_in_120frame,
Two_Max_Temp_in_120frame[SAFE_OFFSET(1)] SecondMax_Temp_in_120frame
FROM (
SELECT *, ARRAY(
SELECT Temperature FROM t.Array_Temp_in_120frame Temperature
ORDER BY Temperature DESC LIMIT 2
) Two_Max_Temp_in_120frame
FROM (
SELECT *, ARRAY_AGG(Temperature) OVER(
PARTITION BY device_type ORDER BY event_ts_seconds
RANGE BETWEEN 7200 PRECEDING AND CURRENT ROW
) AS Array_Temp_in_120frame
FROM your_table
) t
)