I wish to output a sum of revenue per year per product, but also include a column with the year of introduction of each product (based on date first ordered).
I can not figure out what I am doing wrong here as I can do each calculation separately but not it seems within the same query, I am sure this is a basic misunderstanding of grouping or aggregation but I can not figure it out.
With input_data AS (
SELECT '#1238' as order_id, DATE('2021-12-15') as order_date, 'c12345' as sku, 18 as order_value
UNION ALL SELECT '#1201', DATE('2021-10-10'), 'c12345', 18
UNION ALL SELECT '#1198', DATE('2021-07-05'), 'c12345', 20
UNION ALL SELECT '#1134', DATE('2020-10-15'), 'c12345', 10
UNION ALL SELECT '#1112', DATE('2019-08-10'), 'c12345', 5
UNION ALL SELECT '#1234', DATE('2021-07-05'), 'c11111', 118
UNION ALL SELECT '#1294', DATE('2021-01-05'), 'c11111', 68
UNION ALL SELECT '#1290', DATE('2021-01-01'), 'c11111', 82
UNION ALL SELECT '#1284', DATE('2020-01-15'), 'c22222', 98 )
SELECT
sku,
--format_date('%Y', min(order_date)) as year_intro1,
format_date('%Y', min(order_date) over (partition BY sku)) as year_intro2,
format_date('%Y', order_date) as year_order,
sum(order_value) as year_rev
FROM input_data
GROUP BY 1,3
The year_intro will always be one value per sku
I think I want year_intro2 which gives me the error.
year_intro1 works but just picks up the min value within each revenue year (as below).
The desired output for c12345 year_intro has a value of 2019 for all year_order values.
CodePudding user response:
Consider below approach
select sku, any_value(year_intro) year_intro, year_order, sum(order_value) year_rev
from (
select sku, order_value,
extract(year from order_date) as year_order,
min(extract(year from order_date)) over(partition by sku) year_intro
from input_data
)
group by sku, year_order
if applied to sample data in your question - output is