Home > Blockchain >  Aggregate rows based by one column in SQL
Aggregate rows based by one column in SQL

Time:05-10

I have a data set that looks like this:

report_id category_id product_id year_month total_sales
10 A 1 202201 10
10 A 1 202202 16
10 A 2 202201 11
10 A 3 202201 8
10 A 4 202201 12
10 A 4 202202 15
10 B 7 202202 19
10 B 8 202204 17
10 B 9 202203 9

I am trying to summarize the table where I can aggregate by year_month if category_id = A without any aggregation for category_id = B; all products for category_id = A can be referred to as 'misc'.

The desired data set should look like this:

report_id category_id product_id year_month total_sales
10 A misc 202201 41
10 A misc 202202 31
10 B 7 202202 19
10 B 8 202204 17
10 B 9 202203 9

I am totally lost as to how to produce this data set.

Thanks for your help

CodePudding user response:

We can use case to format the column product_id according to the value in category_id and then group by the result.

create table myTable(
  Report_id int,
  Category_id char(1),
  Product_id int,
  Yearmonth varchar(6),
  total_sales int
  );
insert into myTable values
(10   ,'A',   1,  202201  ,10),
(10   ,'A',   1,  202202  ,16),
(10   ,'A',   2,  202201  ,11),
(10   ,'A',   3,  202201  ,8),
(10   ,'A',   4,  202201  ,12),
(10   ,'A',   4,  202202  ,15),
(10   ,'B',   7,  202202  ,19),
(10   ,'B',   8,  202204  ,17),
(10   ,'B',   9,  202203  ,9);
Select
  Report_id,
  Category_id,
  Case when category_id = 'A' then 'misc'
  Else product_id end Product_ID,
  Yearmonth,
  Sum(total_sales) total_sales
From myTable
Group by 1,2,3,4;
Report_id | Category_id | Product_ID | Yearmonth | total_sales
--------: | :---------- | :--------- | :-------- | ----------:
       10 | A           | misc       | 202201    |          41
       10 | A           | misc       | 202202    |          31
       10 | B           | 7          | 202202    |          19
       10 | B           | 8          | 202204    |          17
       10 | B           | 9          | 202203    |           9

db<>fiddle here

CodePudding user response:

One option of generalizing on the category_id field, and assign misc whenever the aggregation finds more than one product_id field, could be using a string aggregation function like GROUP_CONCAT and use a quick check as it's done following:

SELECT 
    report_id,
    category_id,
    IF(GROUP_CONCAT(product_id) LIKE '%,%', 
       'misc', 
       GROUP_CONCAT(product_id))            AS product_id,
    year_month,
    SUM(total_sales)                        AS total_sales
FROM 
    tab
GROUP BY
    report_id,
    category_id,
    year_month
    

You can find an sql fiddle here.

  • Related