I have a database of flight logs and I would like to summarize the average delay time of for each airline at each distinct airport. Ideally, the query would return each airline and its average departure delay time at a specific airport, grouped by the airport. I've tried the below and it returns the correct values but it sorts the results by airline and returns every line, leading to hundreds of duplicate lines. How can I return just 1 line per airline and group by the airport?
SELECT
ORIGIN_AIRPORT,
AIRLINE,
AVG(DEPARTURE_DELAY) OVER (PARTITION BY AIRLINE, ORIGIN_AIRPORT) AS AVERAGE_DEPARTURE_FLIGHT_DELAY
FROM FLIGHTS
WHERE ORIGIN_AIRPORT NOT LIKE('%1%');
Data source: https://www.kaggle.com/usdot/flight-delays/version/1?select=flights.csv
CodePudding user response:
I don't think you need window functions for this. Partitioning is used to create multiple rows, but what you are looking for is aggregate functions on a group.
SELECT
ORIGIN_AIRPORT,
AIRLINE,
AVG(DEPARTURE_DELAY) AS AVERAGE_DEPARTURE_FLIGHT_DELAY
FROM FLIGHTS
WHERE ORIGIN_AIRPORT NOT LIKE('%1%')
GROUP BY AIRLINE, ORIGIN_AIRPORT