Filtering out multiples from DENSE

So I have a dataset of bus times where some of the data was recorded incorrectly and duplicated. The query looks like this

SELECT
  trip_id, 
  expect_arrival_time,
  recorded_arrival_time,
  DENSE_RANK() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) as stop_ranks
FROM
  bus_trips

Results look something like this:

trip_id	expect_arrival_time	recorded_arrival_time	stop_ranks
10001444	2021-06-20 22:14:19 UTC	2021-06-20 22:15:52 UTC	1
10001444	2021-06-20 22:17:07 UTC	2021-06-20 22:19:52 UTC	2
10001444	2021-06-20 22:19:00 UTC	2021-06-20 22:19:52 UTC	2
10001444	2021-06-20 22:32:26 UTC	2021-06-20 22:31:59 UTC	3

What I'd like to be able to do is filter out any time the dense rank has multiples so i'd want a query that returns this

trip_id	expect_arrival_time	recorded_arrival_time	stop_ranks
10001444	2021-06-20 22:14:19 UTC	2021-06-20 22:15:52 UTC	1
10001444	2021-06-20 22:17:07 UTC	2021-06-20 22:19:52 UTC	2
10001444	2021-06-20 22:32:26 UTC	2021-06-20 22:31:59 UTC	3

Any way I could do this?

CodePudding user response：

Try below

SELECT
  trip_id,
  MIN(expect_arrival_time) as expect_arrival_time,
  recorded_arrival_time,
  DENSE_RANK() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) as stop_ranks
FROM bus_trips
GROUP BY trip_id, recorded_arrival_time

with output

CodePudding user response：

We can try to use ROW_NUMBER window function twice to make it.

SELECT trip_id, 
       expect_arrival_time,
       recorded_arrival_time,
       ROW_NUMBER() OVER(PARTITION BY trip_id ORDER BY recorded_arrival_time) stop_ranks
FROM (
   SELECT
       trip_id, 
       expect_arrival_time,
       recorded_arrival_time,
       ROW_NUMBER() OVER(PARTITION BY trip_id ORDER BY expect_arrival_time) as rn
   FROM
       bus_trips
) t1
WHERE rn = 1