I have a question whether RANK () OVER
function can give the 'most popular' value?
Question:
What route was the most popular of those with an average trip time of at least 8 minutes, excluding trips longer than 2 hours?
My query:
SELECT
start_station,
end_station,
count(*),
RANK() OVER (Order BY COUNT(*) DESC) as ranking,
AVG(duration_seconds)
FROM tutorial.dc_bikeshare_q1_2012
WHERE duration_seconds BETWEEN 480 and 7200
GROUP BY start_station, end_station
Can someone confirm whether RANK() OVER
satisfies this 'popularity' value?
CodePudding user response:
rank
should be OK; though, I understood the question a little bit differently than you: first restrict all trips to those shorter than 2 hours, and then calculate the rest of data you need. Something like this:
select start_station,
end_station,
rank() over (order by cnt desc) rnk
from
(select start_station,
end_station,
avg(duration_seconds) avg_duration,
count(*) cnt
from tutorial.dc_bikeshare_q1_2012
where duration_seconds < 7200
group by start_station, end_station
) as x
where avg_duration >= 480;
CodePudding user response:
The question asks for the one most popular route. rank()
would be a waste of time. ORDER BY
LIMIT 1
should be cheaper:
SELECT start_station, end_station, count(*) AS ct
FROM tutorial.dc_bikeshare_q1_2012
WHERE duration_seconds <= 7200
GROUP BY start_station, end_station
HAVING avg(duration_seconds) >= 480
ORDER BY ct DESC -- add more expressions to break ties
LIMIT 1;
You are not the only one working on this: