select *
from example;
edate userid status
2022-05-01 abc123 true
2022-05-02 abc123 (null)
2022-05-03 abc123 (null)
2022-05-04 abc123 (null)
2022-05-05 abc123 false
2022-05-06 abc123 (null)
2022-05-07 abc123 (null)
2022-05-08 abc123 (null)
2022-05-09 abc123 true
2022-05-10 abc123 (null)
I want to write a new field, 'status_backfilled' based on the most recent data point for a userId.
In the example data, The users status is true on May 1st, then null untill May 5th. So, I would like the new field to be true between May 1st till May 4th. Then the status switches to false. This value is unchanged till May 9th, so I want false between May 5th till 8th, then true again.
Desired output:
select *
from example_desired;
edate userid status_backfilled
2022-05-01 abc123 true
2022-05-02 abc123 true
2022-05-03 abc123 true
2022-05-04 abc123 true
2022-05-05 abc123 false
2022-05-06 abc123 false
2022-05-07 abc123 false
2022-05-08 abc123 false
2022-05-09 abc123 true
2022-05-10 abc123 true
How can I columnwise coalesce to get the most recent non null status for a user where data are sorted, in this case by date?
CodePudding user response:
actually, even better :
select e1.edate, e1.userId, coalesce(e1.status, t.status) as status
from example e1
cross join lateral (
select status from example e2
where e1.userid = e2.userid
and e1.edate > e2.edate
and e2.status is not null
order by e2.edate desc limit 1
) t
here is another way :
with cte as (
select e.* ,e_s.edate s_edate, e_s.status s_status , row_number() over (partition by e.userid,e.edate order by e_s.edate desc) rn
from example e
left join (
select *
from example
where status is not null
) e_s on e.userid = e_s.userid
and e_s.edate < e.edate
)
select edate, userId, coalesce(status, s_status) as status
from cte where rn = 1
CodePudding user response:
You can achieve your desired result by using Few window functins -
WITH grp AS (SELECT edate, userid, status,
CASE WHEN status IS NULL THEN 0
ELSE ROW_NUMBER() OVER(ORDER BY edate)
END RN
FROM example
),
grp_sum AS (SELECT edate, userid, status, SUM(RN) OVER(ORDER BY edate) grp_sum
FROM grp
)
SELECT edate, userid,
FIRST_VALUE(status) OVER(PARTITION BY grp_sum ORDER BY status NULLS LAST) status_backfilled
FROM grp_sum;