I have a query that return values based on a boolean column: if the id_crsp includes a boolean true AND false, then it is selected. Values of id_crsp that have only a true or false value are not selected.
From this result, I would like to sort the id_crsp which have duplicates, and select only the one with the oldest date
Database values :
id | idcrsp | date | boolean |
---|---|---|---|
1 | 100 | 11-2022 | true |
2 | 100 | 07-2022 | false |
3 | 200 | 06-2022 | false |
4 | 300 | 09-2022 | true |
5 | 300 | 08-2022 | false |
6 | 400 | 10-2022 | false |
7 | 100 | 01-2022 | false |
8 | 100 | 02-2022 | false |
My actual request :
SELECT true_table.* FROM mydb as true_table
INNER JOIN
(SELECT * FROM mydb WHERE requalif=TRUE) as false_table
ON true_table.idcrsp = false_table.idcrsp
AND true_table.requalif = FALSE;
This return :
id | idcrsp | date | boolean |
---|---|---|---|
8 | 100 | 02-2022 | false |
7 | 100 | 01-2022 | false |
2 | 100 | 07-2022 | false |
5 | 300 | 08-2022 | false |
I would like to enrich my request in order to have only two lines:
- id 5
- id 7 (which has duplicates of id_crsp and which has the oldest date).
Thanks for your help !
CodePudding user response:
using your original query could you just add row_number and partition it by the idcrsp and then order by date descending. then just pick the rows where rn = 1
select * from
(SELECT true_table.*, row_number() over (partition by id_crsp order by date desc) rn FROM mydb as true_table
INNER JOIN
(SELECT * FROM mydb WHERE requalif=TRUE) as false_table
ON true_table.idcrsp = false_table.idcrsp
AND true_table.requalif = FALSE)
where rn = 1