Home > Enterprise >  How to use `GREATEST()` in Snowflake with null values?
How to use `GREATEST()` in Snowflake with null values?

Time:11-22

I'm trying to use GREATEST() in Snowflake, but whenever I have null values, I get null instead of the desired result:

select greatest(1,2,null);

-- null

This behavior has confused many, and it begins with the behavior of GREATEST() in Oracle, which Snowflake matches:

  • enter image description here

    Asking here to get the best available solution.

    CodePudding user response:

    One solution could be to create a UDF that picks the greatest() or the first non null:

    create or replace function greatest2(x1 float, x2 float)
    returns float
    as $$
        coalesce(greatest(x1, x2), x1, x2)
    $$;
    
    select greatest2(a, b)
    from some_nulls;
    

    enter image description here

    However things get more complex if you need to compare multiple values. For example, if you want to compare 3 columns, then you have to create a custom UDF with 3 arguments and check each for null:

    create or replace function greatest3(x1 float, x2 float, x3 float)
    returns float
    as $$
        select iff(x='-inf', null, x)
        from (
            select greatest(nvl(x1, '-inf'), nvl(x2, '-inf'), nvl(x3, '-inf')) x
        )
    $$;
    
    select greatest3(a, b, c)
    from some_nulls;
    

    enter image description here

    CodePudding user response:

    Documenting here an approach that doesn't work (to save others time, or an opportunity to fix): Arrays in a SQL UDF.

    create or replace function greatest_a(arr array)
    returns float
    immutable
    as $$
        select max(value::float)
        from table(flatten(arr))
    $$;
    
    select greatest_a([null,2,3.3])
    from some_nulls;
    

    This works until you try to create the array with values from the table.

    select greatest_a([a, b, c])
    from some_nulls;
    
    -- Unsupported subquery type cannot be evaluated
    

    A similar approach with a JS UDF would work, but it will be slower than a pure SQL UDF.

    CodePudding user response:

    Welcome user: Felipe

    The doc's seem to describe the results exactly:

    Returns the largest value from a list of expressions. If any of the argument values is NULL, the result is NULL. GREATEST supports all data types, including VARIANT.

    If you ORDER BY data, NULL is last, GREATEST is returning the LAST value..

    Snowflake is rather consistent

    select 
        a,b,c
        ,greatest(a, b) as g_a_b
        ,greatest(a, c) as g_a_c
        ,greatest(b, c) as g_b_c
    from values
        (1.1, 2.3, null::float), 
        (null, 2, 3.5), 
        (1, null, 3), 
        (null, null, null)
        t(a,b,c)
    

    gives:

    A B C G_A_B G_A_C G_B_C
    1.1 2.3 null 2.3 null null
    null 2 3.5 null null 3.5
    1 null 3 null 3 null
    null null null null null null

    so your min value -inf solution is interesting/gross, I mean it's correct.

    But what GREATEST is doing is handling a fixed number of SQL declared columns, so the NVL soltuion works: But to make a generic solution, building the fix arrays, and then flattening, and the maxing as max handles NULL in the implicit way the problem implies which "them not being selected"

    select a,b,c, max(f.value)
    from (
        select 
            a,b,c
            ,array_construct_compact(a, b, c) as aa
        from values
            (1.1, 2.3, null), 
            (null, 2, 3.5), 
            (1, null, 3), 
            (null, null, null)
            t(a,b,c)
    ), table(flatten(input=>aa)) as f
    group by 1,2,3,f.seq
    

    gives:

    A B C MAX(F.VALUE)
    1 null 3 3
    null 2 3.5 3.5
    1.1 2.3 null 2.3

    CodePudding user response:

    It is possible to handle nullable columns with ARRAYs:

    SELECT a,b,c, GREATEST([a],[b],[c])[0]::INT
    FROM some_nulls;
    

    For sample data:

    CREATE OR REPLACE TABLE some_nulls(a INT, b INT, c INT)
    AS
    SELECT 1, 2, NULL UNION
    SELECT NULL, 2, 3 UNION
    SELECT 1, NULL, 3 UNION
    SELECT NULL, NULL, NULL;
    

    Output:

    enter image description here


    How it works:

    NULL becomes [undefined] which is the lowest element, therefore this approach could be used for GREATEST but NOT for LEAST function.

  • Related