I'm using the Bixi public dataset found at https://www.bixi.com/en/open-data and have been asked to find "the average number of trips a day for each year-month combination in the dataset". Here's an example of the table from which I'm querying:
id | start_date | start_station_code | end_date | end_station_code | duration_sec | is_member |
---|---|---|---|---|---|---|
85843 | 2016-04-15 00:00:00 | 6315 | 2016-04-15 00:05:00 | 6315 | 349 | 1 |
85844 | 2016-04-15 17:16:00 | 6315 | 2016-04-15 17:37:00 | 6315 | 1293 | 0 |
The query that I used to successfully answer the question was as follows, but how can I simplify it?:
SELECT
daily_trips_2016.avg_daily_trips AS avg_daily_trips_2016,
daily_trips_2017.avg_daily_trips AS avg_daily_trips_2017,
daily_trips_2016.month
FROM
(
SELECT -- This sub-query determines the average number of daily trips per month for the year 2016
ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips,
MONTH(start_date) AS month
FROM trips
WHERE YEAR(start_date) = 2016
GROUP BY month) AS daily_trips_2016
LEFT JOIN
(
SELECT -- This sub-query determines the average number of daily trips per month for the year 2017
ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips,
MONTH(start_date) AS month
FROM trips
WHERE YEAR(start_date) = 2017
GROUP BY month) AS daily_trips_2017
ON daily_trips_2016.month = daily_trips_2017.month; -- Since both sub-queries share the "month" column, the JOIN happens on this column
-- and we can see the daily averages side by side for each year
CodePudding user response:
Use conditional aggregation:
SELECT ROUND(SUM(YEAR(start_date) = 2016) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2016 THEN DAY(start_date) END), 0) AS avg_daily_trips_2016,
ROUND(SUM(YEAR(start_date) = 2017) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2017 THEN DAY(start_date) END), 0) AS avg_daily_trips_2017,
MONTH(start_date) AS month
FROM trips
WHERE YEAR(start_date) IN (2016, 2017)
GROUP BY month;
CodePudding user response:
find "the average number of trips a day for each year-month combination in the dataset".
SELECT ALL
EXTRACT(YEAR_MONTH FROM start_date) AS period,
COUNT(*) / COUNT(DISTINCT DATE(start_date)) AS avg_by_day
FROM trips
GROUP BY 1
ORDER BY 1 ASC
1 row by YYYYMM / average