Home > Software engineering >  Sum Over Partition By several variables with Group By
Sum Over Partition By several variables with Group By

Time:10-28

I'm using the BigQuery to get users' annual spending to append upon a transaction ID. I'm not sure why I'd get the SELECT list expression references column price which is neither grouped nor aggregated at [4:7] As the price column is the aggregated target, why should I group by it?

SELECT
  trans_id,
  user_id,
  SUM(price) OVER (PARTITION BY user_id, EXTRACT(year FROM date)) as user_annual_spend 
FROM sample_sales_table
GROUP BY trans_id, user_id

trans_is [STRING NULLABLE], user_id [STRING NULLABLE], date [DATE NULLABLE] yyyy-mm-dd format, price [NUMERIC NULLABLE]

trans_id | user_id |   date   | price
AAAAAAA  | user1   |2022-01-09| 20
AAAAAAC  | user1   |2022-05-01| 50
BAAAAAB  | user1   |2021-01-05| 10
BAAAACB  | user1   |2020-01-01| 40
BAAAAAC  | user2   |2021-02-01| 30

Ideal output

trans_id | user_id |user_annual_spend
AAAAAAA  | user1   | 70
AAAAAAC  | user1   | 70
BAAAAAB  | user1   | 10
BAAAACB  | user1   | 40
BAAAAAC  | user2   | 30

CodePudding user response:

Modify your query as the follwing:

SELECT trans_id, user_id,
       SUM(user_annual_spend) AS user_annual_spend
FROM 
  (
    SELECT trans_id, user_id,
    SUM(price) OVER (PARTITION BY user_id, EXTRACT(year FROM date)) as user_annual_spend 
    FROM sample_sales_table
  ) T
GROUP BY trans_id, user_id

SUM(price) OVER (PARTITION BY user_id, EXTRACT(year FROM date)) is a window function not aggregate function.

Or simply remove the group by:

SELECT trans_id, user_id,
       SUM(price) OVER (PARTITION BY user_id, EXTRACT(year FROM date)) as user_annual_spend 
FROM sample_sales_table
ORDER BY user_id, trans_id
  •  Tags:  
  • sql
  • Related