Home > Software design >  Filtering out multiples from DENSE_RANk
Filtering out multiples from DENSE_RANk

Time:05-06

So I have a dataset of bus times where some of the data was recorded incorrectly and duplicated. The query looks like this

SELECT
  trip_id, 
  expect_arrival_time,
  recorded_arrival_time,
  DENSE_RANK() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) as stop_ranks
FROM
  bus_trips

Results look something like this:

trip_id expect_arrival_time recorded_arrival_time stop_ranks
10001444 2021-06-20 22:14:19 UTC 2021-06-20 22:15:52 UTC 1
10001444 2021-06-20 22:17:07 UTC 2021-06-20 22:19:52 UTC 2
10001444 2021-06-20 22:19:00 UTC 2021-06-20 22:19:52 UTC 2
10001444 2021-06-20 22:32:26 UTC 2021-06-20 22:31:59 UTC 3

What I'd like to be able to do is filter out any time the dense rank has multiples so i'd want a query that returns this

trip_id expect_arrival_time recorded_arrival_time stop_ranks
10001444 2021-06-20 22:14:19 UTC 2021-06-20 22:15:52 UTC 1
10001444 2021-06-20 22:17:07 UTC 2021-06-20 22:19:52 UTC 2
10001444 2021-06-20 22:32:26 UTC 2021-06-20 22:31:59 UTC 3

Any way I could do this?

CodePudding user response:

Try below

SELECT
  trip_id,
  MIN(expect_arrival_time) as expect_arrival_time,
  recorded_arrival_time,
  DENSE_RANK() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) as stop_ranks
FROM bus_trips
GROUP BY trip_id, recorded_arrival_time          

with output

enter image description here

CodePudding user response:

We can try to use ROW_NUMBER window function twice to make it.

SELECT trip_id, 
       expect_arrival_time,
       recorded_arrival_time,
       ROW_NUMBER() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) stop_ranks
FROM (
   SELECT
       trip_id, 
       expect_arrival_time,
       recorded_arrival_time,
       ROW_NUMBER() OVER(PARTITION BY trip_id ORDER BY expect_arrival_time) as rn
   FROM
       bus_trips
) t1
WHERE rn = 1
  • Related