Home > Software design >  Return most recent non-null value prior to date minus 1 year -- Snowflake
Return most recent non-null value prior to date minus 1 year -- Snowflake

Time:11-14

I have a table with ID, Date, Value. For each row, I want to return the most recent non-null value where Date <= Date minus 1 year.

I'm trying variations of window functions with last_value or lag, and partitioning on ID, order by Date, with the IGNORE NULLS clause. However, I'm not sure where to include the condition where Date <= Date minus 1 year

Sample Data:

ID Date Value
1 01-Jan-19 NULL
1 02-Jan-19 NULL
1 03-Jan-19 NULL
1 04-Jan-19 NULL
1 05-Jan-19 NULL
1 01-Jan-20 NULL
1 02-Jan-20 20
1 03-Jan-20 NULL
1 04-Jan-20 NULL
1 05-Jan-20 30
1 01-Jan-21 20
1 02-Jan-21 NULL
1 03-Jan-21 NULL
1 04-Jan-21 NULL
1 05-Jan-21 30
2 01-Jan-19 NULL
2 02-Jan-19 NULL
2 03-Jan-19 NULL
2 04-Jan-19 30
2 05-Jan-19 30
2 01-Jan-20 25
2 02-Jan-20 20
2 03-Jan-20 NULL
2 04-Jan-20 NULL
2 05-Jan-20 30
2 01-Jan-21 20
2 02-Jan-21 15
2 03-Jan-21 15
2 04-Jan-21 15
2 05-Jan-21 30

Desired Results:

ID Date Value Most_Recent_Prior_Year_Date Most_Recent_Prior_Year_Value
1 01-Jan-19 NULL NULL NULL
1 02-Jan-19 NULL NULL NULL
1 03-Jan-19 NULL NULL NULL
1 04-Jan-19 NULL NULL NULL
1 05-Jan-19 NULL NULL NULL
1 01-Jan-20 NULL NULL NULL
1 02-Jan-20 20 NULL NULL
1 03-Jan-20 NULL NULL NULL
1 04-Jan-20 NULL NULL NULL
1 05-Jan-20 30 NULL NULL
1 01-Jan-21 20 NULL NULL
1 02-Jan-21 NULL 02-Jan-20 20
1 03-Jan-21 NULL 02-Jan-20 20
1 04-Jan-21 NULL 02-Jan-20 20
1 05-Jan-21 30 05-Jan-20 30
2 01-Jan-19 NULL NULL NULL
2 02-Jan-19 NULL NULL NULL
2 03-Jan-19 NULL NULL NULL
2 04-Jan-19 30 NULL NULL
2 05-Jan-19 30 NULL NULL
2 01-Jan-20 25 NULL NULL
2 02-Jan-20 20 NULL NULL
2 03-Jan-20 NULL NULL NULL
2 04-Jan-20 NULL 04-Jan-19 30
2 05-Jan-20 30 05-Jan-19 30
2 01-Jan-21 20 01-Jan-20 25
2 02-Jan-21 15 02-Jan-20 20
2 03-Jan-21 15 02-Jan-20 20
2 04-Jan-21 15 02-Jan-20 20
2 05-Jan-21 30 05-Jan-20 30

CodePudding user response:

Consider group by and two joins instead of window functions, e.g.

select Id, Date, Value
from DataTable d1
inner join (
    select Id, Date as Most_Recent_Prior_Year_Date, max(Value) as Most_Recent_Prior_Year_Value 
    -- max(Value) because there could be many values for that day
    from DataTable d3
    inner join (
        select Id, max(Date) as MostRecentDate
        from DataTable
        where Date <= DATEADD(year, -1, CURRENT_DATE())
        group by Id
    ) d4 on d3.Id = d4.Id AND d3.Date = d4.MostRecentDate
    group by Id, Date
) d2 on d1.Id = d2.Id

CodePudding user response:

IF you strickly have only one row per day per id then you can use:

A single LEFT JOIN and WHERE/QUALIFY will work here:

SELECT 
    a.ID,
    a.Date,
    a.Value
    b.Data AS Most_Recent_Prior_Year_Date
    b.Value AS Most_Recent_Prior_Year_Value
FROM table AS a
LEFT JOIN table AS b 
    on a.id = b.id AND DATEADD('year',-1,a.date) >= b.date
QUALIFY ROW_NUMBER() OVER( PARTITION BY a.id, a.date ORDER BY b.date DESC NULLS LAST) = 1

Otherwise you will want to first separate out a single value per day aka MAX, MIN, FIRST_VALE and then run the above over that output.

  • Related