unpivot based on two columns in snowflake sql-CodePudding

I'm working on creating a monthly cohort report and my final table looks like below

|-----------------|-------------------|------------------------|----------------|
| cohort_month    |    cohort_size    |    month_differnce     |   percentage   |
|-----------------|-------------------|------------------------|----------------|
|    Dec 2021     |       2981        |           00           |     94.919     |
|    Dec 2021     |       2981        |           01           |     74.829     |
|    Dec 2021     |       2981        |           02           |     81.915     |
|    Dec 2021     |       2981        |           03           |     64.726     |
|    Dec 2021     |       2981        |           04           |     12.850     |
|    Dec 2021     |       2981        |           05           |     87.831     |
|    Dec 2021     |       2981        |           06           |     91.717     |
|    Jan 2022     |       3949        |           00           |     51.827     |
|    Jan 2022     |       3949        |           01           |     73.378     |
|    Jan 2022     |       3949        |           02           |     17.391     |
|    Jan 2022     |       3949        |           03           |     29.039     |
|    Jan 2022     |       3949        |           04           |     03.176     |
|    Jan 2022     |       3949        |           05           |     17.182     |
|    Jan 2022     |       3949        |           06           |     92.193     |
|    Feb 2022     |       7189        |           00           |     95.361     |
|    Feb 2022     |       7189        |           01           |     28.361     |
|    Feb 2022     |       7189        |           02           |     10.178     |
|    Feb 2022     |       7189        |           03           |     82.048     |
|    Feb 2022     |       7189        |           04           |     73.391     |
|    Feb 2022     |       7189        |           05           |     91.361     |
|    Feb 2022     |       7189        |           00           |     81.371     |
|-----------------|-------------------|------------------------|----------------|

I want to convert the above table to

|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
| cohort_month    |    cohort_size    |    00   |    01   |    02   |    03   |    04   |    05   |    06   |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
|    Dec 2021     |        2981       | 94.919  | 74.829  | 81.915  | 64.726  | 12.850  | 87.831  | 91.717  |
|    Jan 2022     |        3949       | 51.827  | 73.378  | 17.391  | 29.039  | 03.176  | 17.182  | 92.193  |
|    Feb 2022     |        7189       | 95.361  | 28.361  | 10.178  | 82.048  | 73.391  | 91.361  | 81.371  |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|

My Attempt:

select * from my_table
pivot (sum(cohort_size) for month_difference in (0,1,2,3,4,5,6)) as pivot_table
order by 1;

But this is giving very different result

CodePudding user response：

You were almost there. I will let you address the sorting since alpha sort ≠ calendar sort.

WITH MY_TABLE (COHORT_MONTH, COHORT_SIZE, MONTH_DIFFERENCE, PERCENTAGE) AS
(SELECT * FROM VALUES
 ('Dec 2021',2981,00,94.919)
,('Dec 2021',2981,01,74.829)
,('Dec 2021',2981,02,81.915)
,('Dec 2021',2981,03,64.726)
,('Dec 2021',2981,04,12.850)
,('Dec 2021',2981,05,87.831)
,('Dec 2021',2981,06,91.717)
,('Jan 2022',3949,00,51.827)
,('Jan 2022',3949,01,73.378)
,('Jan 2022',3949,02,17.391)
,('Jan 2022',3949,03,29.039)
,('Jan 2022',3949,04,03.176)
,('Jan 2022',3949,05,17.182)
,('Jan 2022',3949,06,92.193)
,('Feb 2022',7189,00,95.361)
,('Feb 2022',7189,01,28.361)
,('Feb 2022',7189,02,10.178)
,('Feb 2022',7189,03,82.048)
,('Feb 2022',7189,04,73.391)
,('Feb 2022',7189,05,91.361)
,('Feb 2022',7189,06,81.371)
)
SELECT *
FROM MY_TABLE
PIVOT(MAX(PERCENTAGE) FOR MONTH_DIFFERENCE IN (0,1,2,3,4,5,6)) AS P
ORDER BY 1;

COHORT_MONTH	COHORT_SIZE	0	1	2	3	4	5	6
Dec 2021	2981	94.919	74.829	81.915	64.726	12.850	87.831	91.717
Feb 2022	7189	95.361	28.361	10.178	82.048	73.391	91.361	81.371
Jan 2022	3949	51.827	73.378	17.391	29.039	3.176	17.182	92.193

CodePudding user response：

Using Conditional Aggregation as per comment:

SELECT cohort_month, cohort_size,
    MAX(CASE WHEN month_differnce = '00' THEN percentage END) AS "00",
    MAX(CASE WHEN month_differnce = '01' THEN percentage END) AS "01",
    MAX(CASE WHEN month_differnce = '02' THEN percentage END) AS "02",
    MAX(CASE WHEN month_differnce = '03' THEN percentage END) AS "03",
    MAX(CASE WHEN month_differnce = '04' THEN percentage END) AS "04",
    MAX(CASE WHEN month_differnce = '05' THEN percentage END) AS "05",
    MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
FROM my_table 
GROUP BY cohort_month, cohort_size;

The CASE expression could be swapped with IFF function:

MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
<=>
MAX(IFF(month_differnce = '06', percentage, NULL)) AS "06"