Home > Back-end >  How to get unique results by best position
How to get unique results by best position

Time:10-27

I have a large PostgreSQL DB table. From this table I need to take rows grouped by Car_id and position columns. The problem is that I have a lot of duplicates and need to take one row with the best position. I wrote a sql example that gave me the correct results, but it needs to be modified. Or how can I do it in a cleaner way?

And I need to choose a unique car_id, with a minimum position, last by date of scrape, of all passed license plate numbers, I am not interested in what particular license plate number will be.

Example of SQL:

select
    "eventDate",
    "Car_id",
    min("position") as "carPosition",
    groupArray(concat(toString("scrapedAt"), '_', toString("position"))) as "scrapedAtByPosition",
    groupArray(concat("licensePlate", '_', toString("position"))) as "licensePlateByPosition",
    groupArray(concat(toString("amazonChoice"), '_', toString("position"))) as "amazonChoicesByPosition",
    'organic' as "matchType"
from "Car1_ScrapeHistoryLicensePlate"
         inner join (
             select "Car_id", max("scrapedAt") as "scrapedAt"
             from "Car1_ScrapeHistoryLicensePlate"
             where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
             group by "Car_id", "eventDate"
         ) as t1 USING ("Car_id", "scrapedAt")
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "eventDate", "Car_id"
order by "eventDate" desc;

Database records:

eventDate  Car_id  licensePlate position scrapedAt
---------- ------  ------------ -------  --------- 
2022-09-10,   1,   APRJSC512,    1,     1660000001
2022-09-10,   1,   APRJSC512,    1,     1660000002
2022-09-10,   1,   PLBQWN035,    1,     1660000003
2022-09-10,   1,   PLBQWN035,    1,     1660000004
2022-09-10,   1,   PLBQWN035,    2,     1660000002
2022-09-11,   2,   APRJSC512,    1,     1660000011
2022-09-11,   2,   APRJSC512,    2,     1660000022
2022-09-11,   2,   PLBQWN035,    1,     1660000033
2022-09-11,   2,   PLBQWN035,    2,     1660000044
2022-09-11,   2,   PLBQWN035,    5,     1660000022
2022-09-12,   3,   APRJSC512,    3,     1660000111
2022-09-12,   3,   PLBQWN035,    3,     1660000222
2022-09-13,   4,   PLBQWN035,    4,     1660001111
2022-09-14,   5,   PLBQWN035,    5,     1660011111

Expected result:

eventDate  Car_id  licensePlate position scrapedAt
---------- ------  ------------ -------  ---------
2022-09-10,   1,   PLBQWN035,    1,     1660000004
2022-09-11,   2,   PLBQWN035,    1,     1660000033
2022-09-12,   3,   PLBQWN035,    3,     1660000222

CodePudding user response:

In PostgreSQL you can use brilliant distinct on.
The order by list of expressions expressions determine which record to be picked for each car_id. For each group with the same car_id the first one is picked.

select distinct on (car_id) * -- or the relevant expression list here
from the_table
order by car_id, position, scrapedat desc;

DB-fiddle

CodePudding user response:

select  eventDate
       ,Car_id 
       ,licensePlate 
       ,position 
       ,scrapedAt
from
(
select  *
       ,row_number() over(partition by car_id order by position, scrapedat desc) as rn
from    t
) t
where   rn = 1
eventdate car_id licenseplate position scrapedat
2022-09-10 1 PLBQWN035 1 1660000004
2022-09-11 2 PLBQWN035 1 1660000033
2022-09-12 3 PLBQWN035 3 1660000222
2022-09-13 4 PLBQWN035 4 1660001111
2022-09-14 5 PLBQWN035 5 1660011111

Fiddle

  • Related