return last non-null value in left join-CodePudding

I have a query that goes like this

select
  a.date_field,
  b.date_field,
  b.interested_data
from tbl_dates a
  left join tbl_data b
     on a.date_field = b.date_field

this gives me a resultset like:

a_date_field | b_date_field | b_interested_data
2022-01-01   | 2022-01-01   | data_1
2022-01-02   | 2022-01-02   | data_2
2022-01-03   | null         | null

is it possible to return the last non-null value for b_date_field and b_interested_data in row 3? Ideally the result should be

a_date_field | b_date_field | b_interested_data
2022-01-01   | 2022-01-01   | data_1
2022-01-02   | 2022-01-02   | data_2
2022-01-03   | 2022-01-02   | data_2

I know that b_date_field would seem like an error but I am only interested in b_interested_data.

CodePudding user response：

perhaps you can do it using a APPLY operator. Get the "last" row from table b based on date_field

select
  a.date_field,
  b.date_field,
  b.interested_data
from tbl_dates a
  cross apply
  (
      select TOP (1) b.date_field, b.interested_data
      from  tbl_data b
      where a.date_field >= b.date_field
      order by b.date_field desc
  ) b

CodePudding user response：

Using lag() and coalese() we can do it: assuming a_date_Field is the order we can use to determine the "Prior" value.

WITH CTE AS (SELECT '2022-01-01' a_date_field, '2022-01-01' b_date_field, 'data_1' b_interested_data UNION ALL
SELECT '2022-01-02',   '2022-01-02',   'data_2' UNION ALL
SELECT '2022-01-03',    null,          null)
SELECT a_date_Field, 
       coalesce(B_Date_Field,lag(B_date_Field) over (order by a_date_Field)), 
       coalesce(B_Interested_Data,lag(B_Interested_Data) over (order by a_date_Field)) FROM CTE

Giving us:

 -------------- ------------------ ------------------ 
| a_date_Field | (No column name) | (No column name) |
 -------------- ------------------ ------------------ 
| 2022-01-01   | 2022-01-01       | data_1           |
| 2022-01-02   | 2022-01-02       | data_2           |
| 2022-01-03   | 2022-01-02       | data_2           |
 -------------- ------------------ ------------------

CodePudding user response：

Basically I think there are two approaches. One is to use apply, one is to use last_value. I have prepared a small insert with two selects that return the same table, so pick the one you like most (EDIT: Pick the one that fits your solution and has the best performance in your case).

select
    *
into #dates
from (values (1,cast('20220101' as date)),(2,cast('20220102' as date)),(3,cast('20220103' as date))) a(date_id,[date])
;

select
    *
into #data
from (values ('data_1',cast('20220101' as date)),('data_2',cast('20220102' as date))) b(interested_data,[date])
;

--Solution 1
select distinct
      last_value(a.date) over (partition by a.date order by b.date rows between current row and unbounded following)
    , last_value(b.date) over (partition by a.date order by b.date rows between current row and unbounded following)
    , last_value(b.interested_data) over (partition by a.date order by b.date rows between current row and unbounded following)
from #dates a
left join #data b on
    b.date <= a.date
;

--Solution 2
select
    a.date
    , sub.date as b_date
    , sub.interested_data
from #dates a
outer apply (
    select top 1
          b.date as date
        , b.interested_data as interested_data
    from #data b
    where b.date <= a.date
    order by b.date desc
) as sub