Imagine the data:
id audit_id val updated_at
1 11 43 October 09, 2021, 07:55 AM
1 12 34 October 11, 2021, 11:03 PM
1 13 88 January 23, 2022, 01:03 AM
1 14 34 January 23, 2022, 09:41 AM
2 21 200 June 28, 2021, 08:07 PM
2 22 200 December 23, 2021, 03:20 PM
2 23 205 January 12, 2022, 10:15 AM
2 24 211 May 13, 2022, 04:02 AM
Per id, I would like to return the two latest entries of different dates (not just timestamp, the date part should be different):
id audit_id val updated_at
1 12 34 October 11, 2021, 11:03 PM
1 14 34 January 23, 2022, 09:41 AM
2 23 205 January 12, 2022, 10:15 AM
2 24 211 May 13, 2022, 04:02 AM
I assume I need to use partition and lag on the audit_id but I don't know how to start structuring it.
CodePudding user response:
I would attack this in two parts. The first would make sure only the latest from a single date is kept. The second numbers the rows starting with the latest.
with by_day as (
select *,
updated_at::date !=
lag(updated_at::date) over (partition by id
order by updated_at desc) keep
from imagined_data
), numbered as (
select *, row_number() over (partition by id
order by updated_at desc) as rn
from by_day
where coalesce(keep, true)
)
select id, audit_id, val, updated_at
from numbered
where rn <= 2;
db<>fiddle here
CodePudding user response:
Number rows in each group of same id such that the last two timestamps get number 1 and 2, then select only them.
select x.id, x.audit_id, x.val, x.updated_at
from (
select t.id, t.audit_id, t.val, t.updated_at
, row_number() over (partition by t.id order by t.updated_at desc) as rn
from your_table t
) x
where x.rn <= 2
order by x.id asc, x.updated_at asc
Caution: I composed it in my head, didn't try it. Also if your updated_at
column is actual text, it needs to be converted - I assume it is of timestamp type and the verbose format in your question is just (not very practical) presentation.