PostgreSQL: Select latest entries only when values differ in a column-CodePudding

Imagine the data:

id   audit_id   val     updated_at
1    11         43      October 09, 2021, 07:55 AM
1    12         34      October 11, 2021, 11:03 PM
1    13         88      January 23, 2022, 01:03 AM
1    14         34      January 23, 2022, 09:41 AM
2    21         200     June 28, 2021, 08:07 PM
2    22         200     December 23, 2021, 03:20 PM
2    23         205     January 12, 2022, 10:15 AM
2    24         211     May 13, 2022, 04:02 AM

Per id, I would like to return the two latest entries of different dates (not just timestamp, the date part should be different):

id   audit_id   val     updated_at
1    12         34      October 11, 2021, 11:03 PM
1    14         34      January 23, 2022, 09:41 AM
2    23         205     January 12, 2022, 10:15 AM
2    24         211     May 13, 2022, 04:02 AM

I assume I need to use partition and lag on the audit_id but I don't know how to start structuring it.

CodePudding user response：

I would attack this in two parts. The first would make sure only the latest from a single date is kept. The second numbers the rows starting with the latest.

with by_day as (
  select *, 
         updated_at::date != 
           lag(updated_at::date) over (partition by id 
                                           order by updated_at desc) keep
    from imagined_data
), numbered as (
  select *, row_number() over (partition by id
                                   order by updated_at desc) as rn
    from by_day
   where coalesce(keep, true)
)
select id, audit_id, val, updated_at
  from numbered
 where rn <= 2;

db<>fiddle here

CodePudding user response：

Number rows in each group of same id such that the last two timestamps get number 1 and 2, then select only them.

select x.id, x.audit_id, x.val, x.updated_at
from (
  select t.id, t.audit_id, t.val, t.updated_at
       , row_number() over (partition by t.id order by t.updated_at desc) as rn
  from your_table t
) x
where x.rn <= 2
order by x.id asc, x.updated_at asc

Caution: I composed it in my head, didn't try it. Also if your updated_at column is actual text, it needs to be converted - I assume it is of timestamp type and the verbose format in your question is just (not very practical) presentation.