I have a large PostgreSQL DB table. From this table I need to take rows grouped by Car_id
and position
columns.
The problem is that I have a lot of duplicates and need to take one row with the best position
.
I wrote a sql example that gave me the correct results, but it needs to be modified. Or how can I do it in a cleaner way?
And I need to choose a unique car_id, with a minimum position, last by date of scrape, of all passed license plate numbers, I am not interested in what particular license plate number will be.
Example of SQL:
select
"eventDate",
"Car_id",
min("position") as "carPosition",
groupArray(concat(toString("scrapedAt"), '_', toString("position"))) as "scrapedAtByPosition",
groupArray(concat("licensePlate", '_', toString("position"))) as "licensePlateByPosition",
groupArray(concat(toString("amazonChoice"), '_', toString("position"))) as "amazonChoicesByPosition",
'organic' as "matchType"
from "Car1_ScrapeHistoryLicensePlate"
inner join (
select "Car_id", max("scrapedAt") as "scrapedAt"
from "Car1_ScrapeHistoryLicensePlate"
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "Car_id", "eventDate"
) as t1 USING ("Car_id", "scrapedAt")
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "eventDate", "Car_id"
order by "eventDate" desc;
Database records:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, APRJSC512, 1, 1660000001
2022-09-10, 1, APRJSC512, 1, 1660000002
2022-09-10, 1, PLBQWN035, 1, 1660000003
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-10, 1, PLBQWN035, 2, 1660000002
2022-09-11, 2, APRJSC512, 1, 1660000011
2022-09-11, 2, APRJSC512, 2, 1660000022
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-11, 2, PLBQWN035, 2, 1660000044
2022-09-11, 2, PLBQWN035, 5, 1660000022
2022-09-12, 3, APRJSC512, 3, 1660000111
2022-09-12, 3, PLBQWN035, 3, 1660000222
2022-09-13, 4, PLBQWN035, 4, 1660001111
2022-09-14, 5, PLBQWN035, 5, 1660011111
Expected result:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-12, 3, PLBQWN035, 3, 1660000222
CodePudding user response:
In PostgreSQL you can use brilliant distinct on.
The order by
list of expressions expressions determine which record to be picked for each car_id
. For each group with the same car_id
the first one is picked.
select distinct on (car_id) * -- or the relevant expression list here
from the_table
order by car_id, position, scrapedat desc;
CodePudding user response:
select eventDate
,Car_id
,licensePlate
,position
,scrapedAt
from
(
select *
,row_number() over(partition by car_id order by position, scrapedat desc) as rn
from t
) t
where rn = 1
eventdate | car_id | licenseplate | position | scrapedat |
---|---|---|---|---|
2022-09-10 | 1 | PLBQWN035 | 1 | 1660000004 |
2022-09-11 | 2 | PLBQWN035 | 1 | 1660000033 |
2022-09-12 | 3 | PLBQWN035 | 3 | 1660000222 |
2022-09-13 | 4 | PLBQWN035 | 4 | 1660001111 |
2022-09-14 | 5 | PLBQWN035 | 5 | 1660011111 |