Home > Software design >  Column neither grouped nor aggregated after introducing window query
Column neither grouped nor aggregated after introducing window query

Time:08-30

I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:

SELECT
  date,
  SUM(Total_Bags) as weekly_bags,
FROM 
  `course.avocado`
WHERE
  EXTRACT(year FROM date) = 2015
GROUP BY
  date
ORDER BY
  date

And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:

SELECT
  date,
  SUM(Total_Bags) as weekly_bags,
  SUM(Total_Bags) OVER(
    PARTITION BY date
    ORDER BY date
    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
  )
FROM 
  `course.avocado`
WHERE
  EXTRACT(year FROM date) = 2015
GROUP BY
  date
ORDER BY
  date

but im getting the common error:

SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]

and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.

CodePudding user response:

In your query, which returns 2 columns: date and aggregate SUM(Total_Bags), the window function SUM() is evaluated after the aggregation when there is no column Total_Bags and this is why you can't use it inside the window function.

However, you can do want you want, without group by, by using only window functions and DISTINCT:

SELECT DISTINCT date,
       SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
       SUM(Total_Bags) OVER(
         PARTITION BY date
         ORDER BY date
         ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
       )
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
ORDER BY date;

or, use window function on the the aggregated result:

SELECT date,
       SUM(Total_Bags) AS weekly_bags,
       SUM(SUM(Total_Bags)) OVER(
         ORDER BY date
         ROWS BETWEEN 4 PRECEDING AND CURRENT ROW 
       )
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
GROUP BY date
ORDER BY date;

CodePudding user response:

I tried to approach it from a different angle and seems I have figured it out, the results seem just right. Here's the code:

WITH daily_bags AS
  (SELECT
    Date,
    CAST(SUM(Total_Bags) as int64) as all_bags
  FROM
    `course.avocado`
  WHERE 
    EXTRACT(year from Date) = 2015
  GROUP BY
    Date
  ORDER BY
    Date)

SELECT
  Date,
  all_bags,
  SUM(all_bags) OVER(
    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
  ) as rolling_sum
FROM
  daily_bags

Thanks everyone for your help.

  • Related