Home > database >  First non Null value (ordered) aggregate function
First non Null value (ordered) aggregate function

Time:01-13

Given the following table in GBQ

Element, tmed, ingestion_time
Item1, 10.0, 2023-01-01
Item1, 11.0, 2023-01-02
Item2, null, 2023-01-02
Item2, 20.0 ,2023-01-03
Item3, 21.0, 2023-01-03
Item3, null, 2023-01-04
Item4, null, 2023-01-04
Item4, null, 2023-01-05

I would like to retrieve the latest non-null value (with the latest ingestion_time). That would retrieve the following result:

Element, tmed, ingestion_time
Item1, 11.0, 2023-01-02
Item2, 20.0, 2023-01-03
Item3, 21.0, 2023-01-03
Item4, null, 2023-01-05

For this purpose, I was using the aggregate function ANY_VALUE, which, even if the documentation does not show very clearly, takes the first non-null value (check discussion here) Nevertheless, it just takes the first non-null value, independently of the DATETIME field ingestion_time. I tried different ORDER BY options, but with no success.

CodePudding user response:

Try using row_number function as the following:

select element, tmed, ingestion_time
from
(
  select *,
    row_number() over (partition by element order by case when tmed is not null then 1 else 2 end, ingestion_time desc) rn
  from table_name
) T
where rn = 1

CodePudding user response:

You can use the ROW_NUMBER window function inside a QUALIFY clause as follows by:

  • partitioning on your elements
  • ordering on tmed is NULL (pulls down your null values), ingestion_time DESC (pulls up your dates)
SELECT * 
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY Element ORDER BY tmed IS NULL, ingestion_time DESC) = 1

CodePudding user response:

All solutions are simple and effective. Nevertheless, in order to generalize it to more fields and not only to tmed, I found the following solution:

WITH overwritten_original_table AS (
   SELECT * EXCEPT(tmed),
   FIRST_VALUE(tmed IGNORE NULLS) OVER (PARTITION BY element ORDER BY ingestion_time DESC) AS tmed
   -- Here, you can add more fields with the same FIRST_VALUE logic
   FROM original_table
)

SELECT
element,
ANY_VALUE(tmed) AS tmed,
-- Here, you can add more fields with the ANY_VALUE logic
MAX(ingestion_time) AS ingestion_time
FROM overwritten_original_table
GROUP BY fecha

As it is a solution intended for more than 1 field, I just took the maximum ingestion_time, but you can modify it to get an ingestion_time for every field.

  • Related