Unpivoting for large dataset and greater number of unique columns-CodePudding

The pivot and unpivot functions in snowflake are not efficient for processing 30 unique columns into row based.

Use case : I have 35 different month columns which needs to be rows based , another 35 columns will be quantity for the corresponding month . So at the and there will be 2 columns(one for month data and another for quantity) for 70 unique columns there would be aggregation of quantity based on month But unpivoting is not at all efficient. The below query is scanning 15 GB of data from the main table used

select part_num ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year,
sum(quantity) as quantities
 from table_name
unpivot(dates for cols in (month_1, 30 other uniue cols)),
unpivot(quantity for cols in (qunatity_1, 30 other uniue cols)),
group by part_num, month_year

Is there any other approach to unpivot large dataset. Thanks

CodePudding user response：

Alternative approach could be using conditional aggregation:

with cte as (
  select part_num
     ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year
     ,sum(quantity) as quantities
  from table_name
  group by part_num, month_year
)
SELECT part_num 
      -- lowest date
      ,'2020-01' AS "2020-01"
      ,MAX(IFF(month_year='2020-01', quantities, NULL) AS "quantities_2020-01"
      -- next date
      ,...
      -- last date
      ,'2022-04' AS "2022-04"
      ,MAX(IFF(month_year='2022-04', quantities, NULL) AS "quantities_2022-04"
FROM cte
GROUP BY part_num;

Version using single GROUP BY and TO_VARCHAR with format:

SELECT part_num 
  -- lowest date
  ,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2020-01',NULL) AS "2020-01"
  ,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2020-01',quantities,NULL) AS "quantities_2020-01"
  -- next date
  ,...
  -- last date
  ,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2022-04',NULL) AS "2022-04"
  ,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2022-04',quantities,NULL) AS "quantities_2022-04"
FROM table_name
GROUP BY part_num;

CodePudding user response：

So if we get some example DATA and test is what is happening is what is wanted..

Here is a trival and tiny CTE worth of data

with table_name(part_num, month_1, month_2, month_3, qunatity_1, qunatity_2, qunatity_3) as (
    select * from values
    (1, '2022-01-01'::date, '2022-02-01'::date, '2022-03-01'::date, 4, 5, 6)
)

now pointing your SQL at it (after making it compile)

select 
   part_num
   ,to_char(dates, 'yyyy-mm') as month_year
   ,sum(quantity) as quantities
from table_name
    unpivot(dates for month in (month_1, month_2, month_3))
    unpivot(quantity for quan in (qunatity_1, qunatity_2, qunatity_3))
group by part_num, month_year

gives:

PART_NUM	MONTH_YEAR	QUANTITIES
1	2022-01	15
1	2022-02	15
1	2022-03	15

which is not what I think you are after.

If we look at the un aggregated rows:

PART_NUM	MONTH	DATES	QUAN	QUANTITY
1	MONTH_1	2022-01-01	QUNATITY_1	4
1	MONTH_1	2022-01-01	QUNATITY_2	5
1	MONTH_1	2022-01-01	QUNATITY_3	6
1	MONTH_2	2022-02-01	QUNATITY_1	4
1	MONTH_2	2022-02-01	QUNATITY_2	5
1	MONTH_2	2022-02-01	QUNATITY_3	6
1	MONTH_3	2022-03-01	QUNATITY_1	4
1	MONTH_3	2022-03-01	QUNATITY_2	5
1	MONTH_3	2022-03-01	QUNATITY_3	6

we are getting a cross join, which is not what I believe you are wanting.

my understanding is you want a relationship between month (1-35) and quantity (1-35)

thus a mix like:

PART_NUM	MONTH	DATES	QUAN	QUANTITY
1	MONTH_1	2022-01-01	QUNATITY_1	4
1	MONTH_2	2022-02-01	QUNATITY_2	5
1	MONTH_3	2022-03-01	QUNATITY_3	6

Guessed Answer:

My guess at what you really are wanting is:

select
  part_num
  ,to_char(dates, 'yyyy-mm') as month_year
  ,array_construct(qunatity_1, qunatity_2, qunatity_3)[split_part(month,'_',2)::number - 1] as qunatity
from table_name
    unpivot(dates for month in (month_1, month_2, month_3))
order by 1,2;

which gives (for the same above CTE data):

PART_NUM	MONTH_YEAR	QUNATITY
1	2022-01	4
1	2022-02	5
1	2022-03	6