I have a table with ID, Date, Value. For each row, I want to return the most recent non-null value where Date <= Date minus 1 year.
I'm trying variations of window functions with last_value or lag, and partitioning on ID, order by Date, with the IGNORE NULLS clause. However, I'm not sure where to include the condition where Date <= Date minus 1 year
Sample Data:
ID | Date | Value |
---|---|---|
1 | 01-Jan-19 | NULL |
1 | 02-Jan-19 | NULL |
1 | 03-Jan-19 | NULL |
1 | 04-Jan-19 | NULL |
1 | 05-Jan-19 | NULL |
1 | 01-Jan-20 | NULL |
1 | 02-Jan-20 | 20 |
1 | 03-Jan-20 | NULL |
1 | 04-Jan-20 | NULL |
1 | 05-Jan-20 | 30 |
1 | 01-Jan-21 | 20 |
1 | 02-Jan-21 | NULL |
1 | 03-Jan-21 | NULL |
1 | 04-Jan-21 | NULL |
1 | 05-Jan-21 | 30 |
2 | 01-Jan-19 | NULL |
2 | 02-Jan-19 | NULL |
2 | 03-Jan-19 | NULL |
2 | 04-Jan-19 | 30 |
2 | 05-Jan-19 | 30 |
2 | 01-Jan-20 | 25 |
2 | 02-Jan-20 | 20 |
2 | 03-Jan-20 | NULL |
2 | 04-Jan-20 | NULL |
2 | 05-Jan-20 | 30 |
2 | 01-Jan-21 | 20 |
2 | 02-Jan-21 | 15 |
2 | 03-Jan-21 | 15 |
2 | 04-Jan-21 | 15 |
2 | 05-Jan-21 | 30 |
Desired Results:
ID | Date | Value | Most_Recent_Prior_Year_Date | Most_Recent_Prior_Year_Value |
---|---|---|---|---|
1 | 01-Jan-19 | NULL | NULL | NULL |
1 | 02-Jan-19 | NULL | NULL | NULL |
1 | 03-Jan-19 | NULL | NULL | NULL |
1 | 04-Jan-19 | NULL | NULL | NULL |
1 | 05-Jan-19 | NULL | NULL | NULL |
1 | 01-Jan-20 | NULL | NULL | NULL |
1 | 02-Jan-20 | 20 | NULL | NULL |
1 | 03-Jan-20 | NULL | NULL | NULL |
1 | 04-Jan-20 | NULL | NULL | NULL |
1 | 05-Jan-20 | 30 | NULL | NULL |
1 | 01-Jan-21 | 20 | NULL | NULL |
1 | 02-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 03-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 04-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 05-Jan-21 | 30 | 05-Jan-20 | 30 |
2 | 01-Jan-19 | NULL | NULL | NULL |
2 | 02-Jan-19 | NULL | NULL | NULL |
2 | 03-Jan-19 | NULL | NULL | NULL |
2 | 04-Jan-19 | 30 | NULL | NULL |
2 | 05-Jan-19 | 30 | NULL | NULL |
2 | 01-Jan-20 | 25 | NULL | NULL |
2 | 02-Jan-20 | 20 | NULL | NULL |
2 | 03-Jan-20 | NULL | NULL | NULL |
2 | 04-Jan-20 | NULL | 04-Jan-19 | 30 |
2 | 05-Jan-20 | 30 | 05-Jan-19 | 30 |
2 | 01-Jan-21 | 20 | 01-Jan-20 | 25 |
2 | 02-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 03-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 04-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 05-Jan-21 | 30 | 05-Jan-20 | 30 |
CodePudding user response:
Consider group by
and two join
s instead of window functions, e.g.
select Id, Date, Value
from DataTable d1
inner join (
select Id, Date as Most_Recent_Prior_Year_Date, max(Value) as Most_Recent_Prior_Year_Value
-- max(Value) because there could be many values for that day
from DataTable d3
inner join (
select Id, max(Date) as MostRecentDate
from DataTable
where Date <= DATEADD(year, -1, CURRENT_DATE())
group by Id
) d4 on d3.Id = d4.Id AND d3.Date = d4.MostRecentDate
group by Id, Date
) d2 on d1.Id = d2.Id
CodePudding user response:
IF you strickly have only one row per day per id then you can use:
A single LEFT JOIN and WHERE/QUALIFY will work here:
SELECT
a.ID,
a.Date,
a.Value
b.Data AS Most_Recent_Prior_Year_Date
b.Value AS Most_Recent_Prior_Year_Value
FROM table AS a
LEFT JOIN table AS b
on a.id = b.id AND DATEADD('year',-1,a.date) >= b.date
QUALIFY ROW_NUMBER() OVER( PARTITION BY a.id, a.date ORDER BY b.date DESC NULLS LAST) = 1
Otherwise you will want to first separate out a single value per day aka MAX, MIN, FIRST_VALE and then run the above over that output.