I have a data set that looks like this:
report_id | category_id | product_id | year_month | total_sales |
---|---|---|---|---|
10 | A | 1 | 202201 | 10 |
10 | A | 1 | 202202 | 16 |
10 | A | 2 | 202201 | 11 |
10 | A | 3 | 202201 | 8 |
10 | A | 4 | 202201 | 12 |
10 | A | 4 | 202202 | 15 |
10 | B | 7 | 202202 | 19 |
10 | B | 8 | 202204 | 17 |
10 | B | 9 | 202203 | 9 |
I am trying to summarize the table where I can aggregate by year_month if category_id = A without any aggregation for category_id = B; all products for category_id = A can be referred to as 'misc'.
The desired data set should look like this:
report_id | category_id | product_id | year_month | total_sales |
---|---|---|---|---|
10 | A | misc | 202201 | 41 |
10 | A | misc | 202202 | 31 |
10 | B | 7 | 202202 | 19 |
10 | B | 8 | 202204 | 17 |
10 | B | 9 | 202203 | 9 |
I am totally lost as to how to produce this data set.
Thanks for your help
CodePudding user response:
We can use case to format the column product_id according to the value in category_id and then group by the result.
create table myTable( Report_id int, Category_id char(1), Product_id int, Yearmonth varchar(6), total_sales int );
✓
insert into myTable values (10 ,'A', 1, 202201 ,10), (10 ,'A', 1, 202202 ,16), (10 ,'A', 2, 202201 ,11), (10 ,'A', 3, 202201 ,8), (10 ,'A', 4, 202201 ,12), (10 ,'A', 4, 202202 ,15), (10 ,'B', 7, 202202 ,19), (10 ,'B', 8, 202204 ,17), (10 ,'B', 9, 202203 ,9);
✓
Select Report_id, Category_id, Case when category_id = 'A' then 'misc' Else product_id end Product_ID, Yearmonth, Sum(total_sales) total_sales From myTable Group by 1,2,3,4;
Report_id | Category_id | Product_ID | Yearmonth | total_sales --------: | :---------- | :--------- | :-------- | ----------: 10 | A | misc | 202201 | 41 10 | A | misc | 202202 | 31 10 | B | 7 | 202202 | 19 10 | B | 8 | 202204 | 17 10 | B | 9 | 202203 | 9
db<>fiddle here
CodePudding user response:
One option of generalizing on the category_id
field, and assign misc
whenever the aggregation finds more than one product_id
field, could be using a string aggregation function like GROUP_CONCAT
and use a quick check as it's done following:
SELECT
report_id,
category_id,
IF(GROUP_CONCAT(product_id) LIKE '%,%',
'misc',
GROUP_CONCAT(product_id)) AS product_id,
year_month,
SUM(total_sales) AS total_sales
FROM
tab
GROUP BY
report_id,
category_id,
year_month
You can find an sql fiddle here.