Home > Net >  Simplifying SQL query
Simplifying SQL query

Time:07-01

I'm using the Bixi public dataset found at https://www.bixi.com/en/open-data and have been asked to find "the average number of trips a day for each year-month combination in the dataset". Here's an example of the table from which I'm querying:

id start_date start_station_code end_date end_station_code duration_sec is_member
85843 2016-04-15 00:00:00 6315 2016-04-15 00:05:00 6315 349 1
85844 2016-04-15 17:16:00 6315 2016-04-15 17:37:00 6315 1293 0

The query that I used to successfully answer the question was as follows, but how can I simplify it?:

SELECT
    daily_trips_2016.avg_daily_trips AS avg_daily_trips_2016,
    daily_trips_2017.avg_daily_trips AS avg_daily_trips_2017,
    daily_trips_2016.month
FROM
    (
    SELECT -- This sub-query determines the average number of daily trips per month for the year 2016
        ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips,
        MONTH(start_date) AS month
    FROM trips
    WHERE YEAR(start_date) = 2016
    GROUP BY month) AS daily_trips_2016
LEFT JOIN
    (
    SELECT -- This sub-query determines the average number of daily trips per month for the year 2017
        ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips,
        MONTH(start_date) AS month
    FROM trips
    WHERE YEAR(start_date) = 2017
    GROUP BY month) AS daily_trips_2017
ON daily_trips_2016.month = daily_trips_2017.month; -- Since both sub-queries share the "month" column, the JOIN happens on this column
-- and we can see the daily averages side by side for each year

CodePudding user response:

Use conditional aggregation:

SELECT ROUND(SUM(YEAR(start_date) = 2016) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2016 THEN DAY(start_date) END), 0)  AS avg_daily_trips_2016,
       ROUND(SUM(YEAR(start_date) = 2017) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2017 THEN DAY(start_date) END), 0)  AS avg_daily_trips_2017,
       MONTH(start_date) AS month
FROM trips
WHERE YEAR(start_date) IN (2016, 2017)
GROUP BY month;

CodePudding user response:

find "the average number of trips a day for each year-month combination in the dataset".

SELECT ALL
    EXTRACT(YEAR_MONTH FROM start_date) AS period,
    COUNT(*) / COUNT(DISTINCT DATE(start_date)) AS avg_by_day
FROM trips
GROUP BY 1
ORDER BY 1 ASC

1 row by YYYYMM / average

  • Related