Home > Blockchain >  How to get a set of records from within each partition based on a condition
How to get a set of records from within each partition based on a condition

Time:07-08

From a table like this:

id status date category
1 PENDING 2022-07-01 XYZ
2 DONE 2022-07-04 XYZ
3 PENDING 2022-07-03 DEF
4 DONE 2022-07-08 DEF

I would like to get the most recent records within each category (here 2 and 4). But there are at least two factors that complicate things.

First, there might be more than two records in the same category. (The records come in pairs.)

id status date category
1 PENDING 2022-07-01 XYZ
2 PENDING 2022-07-02 XYZ
3 FAILED 2022-07-04 XYZ
4 FAILED 2022-07-05 XYZ
5 PENDING 2022-07-03 DEF
6 DONE 2022-07-08 DEF

In this case, I'd have to get 3, 4, and 6. Were there six records in the XYZ category, I'd have to get the most recent three.

And, secondly, the date could be the same for the most recent records within a category.

I tried something like this:

WITH temp AS (
    SELECT *,
           dense_rank() OVER (PARTITION BY category ORDER BY date DESC) rnk
    FROM tbl
)
SELECT *
FROM temp
WHERE rnk = 1;

But this fails when there are more than 2 records in a category and I need to get the most recent two.

EDIT:

Eli Johnson has pointed out in a comment that there should be information about which messages are pairs. Of course! I digged around a bit, and after a join or two there is.

id status date category prev_id
1 PENDING 2022-07-01 XYZ {}
2 PENDING 2022-07-02 XYZ {}
3 FAILED 2022-07-04 XYZ {1}
4 FAILED 2022-07-05 XYZ {2}
5 PENDING 2022-07-03 DEF {}
6 DONE 2022-07-08 DEF {5}

CodePudding user response:

The requirements are more hard-coded here then following proper design. Based on what has been proposed in the question, I just tweaked it a little bit to get last records.

Assuming that records are always in pair, as mentioned in the question.

WITH temp AS (
    SELECT *,
           row_number() OVER (PARTITION BY category ORDER BY date1 DESC) rnk,
           count(1) over (partition by category) cnt
    FROM status
)
SELECT *
FROM temp
WHERE rnk*2 <= cnt;

Refer fiddle here.

  • Related