I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
but im getting the common error:
SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]
and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.
CodePudding user response:
In your query, which returns 2 columns: date
and aggregate SUM(Total_Bags)
, the window function SUM()
is evaluated after the aggregation when there is no column Total_Bags
and this is why you can't use it inside the window function.
However, you can do want you want, without group by
, by using only window functions and DISTINCT
:
SELECT DISTINCT date,
SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
ORDER BY date;
or, use window function on the the aggregated result:
SELECT date,
SUM(Total_Bags) AS weekly_bags,
SUM(SUM(Total_Bags)) OVER(
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
GROUP BY date
ORDER BY date;
CodePudding user response:
I tried to approach it from a different angle and seems I have figured it out, the results seem just right. Here's the code:
WITH daily_bags AS
(SELECT
Date,
CAST(SUM(Total_Bags) as int64) as all_bags
FROM
`course.avocado`
WHERE
EXTRACT(year from Date) = 2015
GROUP BY
Date
ORDER BY
Date)
SELECT
Date,
all_bags,
SUM(all_bags) OVER(
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as rolling_sum
FROM
daily_bags
Thanks everyone for your help.