Given the following table in GBQ
Element, tmed, ingestion_time
Item1, 10.0, 2023-01-01
Item1, 11.0, 2023-01-02
Item2, null, 2023-01-02
Item2, 20.0 ,2023-01-03
Item3, 21.0, 2023-01-03
Item3, null, 2023-01-04
Item4, null, 2023-01-04
Item4, null, 2023-01-05
I would like to retrieve the latest non-null value (with the latest ingestion_time). That would retrieve the following result:
Element, tmed, ingestion_time
Item1, 11.0, 2023-01-02
Item2, 20.0, 2023-01-03
Item3, 21.0, 2023-01-03
Item4, null, 2023-01-05
For this purpose, I was using the aggregate function ANY_VALUE
, which, even if the documentation does not show very clearly, takes the first non-null value (check discussion here) Nevertheless, it just takes the first non-null value, independently of the DATETIME field ingestion_time.
I tried different ORDER BY
options, but with no success.
CodePudding user response:
Try using row_number function as the following:
select element, tmed, ingestion_time
from
(
select *,
row_number() over (partition by element order by case when tmed is not null then 1 else 2 end, ingestion_time desc) rn
from table_name
) T
where rn = 1
CodePudding user response:
You can use the ROW_NUMBER
window function inside a QUALIFY
clause as follows by:
- partitioning on your elements
- ordering on
tmed is NULL
(pulls down your null values),ingestion_time DESC
(pulls up your dates)
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY Element ORDER BY tmed IS NULL, ingestion_time DESC) = 1
CodePudding user response:
All solutions are simple and effective. Nevertheless, in order to generalize it to more fields and not only to tmed, I found the following solution:
WITH overwritten_original_table AS (
SELECT * EXCEPT(tmed),
FIRST_VALUE(tmed IGNORE NULLS) OVER (PARTITION BY element ORDER BY ingestion_time DESC) AS tmed
-- Here, you can add more fields with the same FIRST_VALUE logic
FROM original_table
)
SELECT
element,
ANY_VALUE(tmed) AS tmed,
-- Here, you can add more fields with the ANY_VALUE logic
MAX(ingestion_time) AS ingestion_time
FROM overwritten_original_table
GROUP BY fecha
As it is a solution intended for more than 1 field, I just took the maximum ingestion_time, but you can modify it to get an ingestion_time for every field.