Home > Software engineering >  unpivot based on two columns in snowflake sql
unpivot based on two columns in snowflake sql

Time:07-20

I'm working on creating a monthly cohort report and my final table looks like below

|-----------------|-------------------|------------------------|----------------|
| cohort_month    |    cohort_size    |    month_differnce     |   percentage   |
|-----------------|-------------------|------------------------|----------------|
|    Dec 2021     |       2981        |           00           |     94.919     |
|    Dec 2021     |       2981        |           01           |     74.829     |
|    Dec 2021     |       2981        |           02           |     81.915     |
|    Dec 2021     |       2981        |           03           |     64.726     |
|    Dec 2021     |       2981        |           04           |     12.850     |
|    Dec 2021     |       2981        |           05           |     87.831     |
|    Dec 2021     |       2981        |           06           |     91.717     |
|    Jan 2022     |       3949        |           00           |     51.827     |
|    Jan 2022     |       3949        |           01           |     73.378     |
|    Jan 2022     |       3949        |           02           |     17.391     |
|    Jan 2022     |       3949        |           03           |     29.039     |
|    Jan 2022     |       3949        |           04           |     03.176     |
|    Jan 2022     |       3949        |           05           |     17.182     |
|    Jan 2022     |       3949        |           06           |     92.193     |
|    Feb 2022     |       7189        |           00           |     95.361     |
|    Feb 2022     |       7189        |           01           |     28.361     |
|    Feb 2022     |       7189        |           02           |     10.178     |
|    Feb 2022     |       7189        |           03           |     82.048     |
|    Feb 2022     |       7189        |           04           |     73.391     |
|    Feb 2022     |       7189        |           05           |     91.361     |
|    Feb 2022     |       7189        |           00           |     81.371     |
|-----------------|-------------------|------------------------|----------------|

I want to convert the above table to

|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
| cohort_month    |    cohort_size    |    00   |    01   |    02   |    03   |    04   |    05   |    06   |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
|    Dec 2021     |        2981       | 94.919  | 74.829  | 81.915  | 64.726  | 12.850  | 87.831  | 91.717  |
|    Jan 2022     |        3949       | 51.827  | 73.378  | 17.391  | 29.039  | 03.176  | 17.182  | 92.193  |
|    Feb 2022     |        7189       | 95.361  | 28.361  | 10.178  | 82.048  | 73.391  | 91.361  | 81.371  |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|

My Attempt:

select * from my_table
pivot (sum(cohort_size) for month_difference in (0,1,2,3,4,5,6)) as pivot_table
order by 1;

But this is giving very different result

CodePudding user response:

You were almost there. I will let you address the sorting since alpha sort ≠ calendar sort.

WITH MY_TABLE (COHORT_MONTH, COHORT_SIZE, MONTH_DIFFERENCE, PERCENTAGE) AS
(SELECT * FROM VALUES
 ('Dec 2021',2981,00,94.919)
,('Dec 2021',2981,01,74.829)
,('Dec 2021',2981,02,81.915)
,('Dec 2021',2981,03,64.726)
,('Dec 2021',2981,04,12.850)
,('Dec 2021',2981,05,87.831)
,('Dec 2021',2981,06,91.717)
,('Jan 2022',3949,00,51.827)
,('Jan 2022',3949,01,73.378)
,('Jan 2022',3949,02,17.391)
,('Jan 2022',3949,03,29.039)
,('Jan 2022',3949,04,03.176)
,('Jan 2022',3949,05,17.182)
,('Jan 2022',3949,06,92.193)
,('Feb 2022',7189,00,95.361)
,('Feb 2022',7189,01,28.361)
,('Feb 2022',7189,02,10.178)
,('Feb 2022',7189,03,82.048)
,('Feb 2022',7189,04,73.391)
,('Feb 2022',7189,05,91.361)
,('Feb 2022',7189,06,81.371)
)
SELECT *
FROM MY_TABLE
PIVOT(MAX(PERCENTAGE) FOR MONTH_DIFFERENCE IN (0,1,2,3,4,5,6)) AS P
ORDER BY 1;
COHORT_MONTH COHORT_SIZE 0 1 2 3 4 5 6
Dec 2021 2981 94.919 74.829 81.915 64.726 12.850 87.831 91.717
Feb 2022 7189 95.361 28.361 10.178 82.048 73.391 91.361 81.371
Jan 2022 3949 51.827 73.378 17.391 29.039 3.176 17.182 92.193

CodePudding user response:

Using Conditional Aggregation as per comment:

SELECT cohort_month, cohort_size,
    MAX(CASE WHEN month_differnce = '00' THEN percentage END) AS "00",
    MAX(CASE WHEN month_differnce = '01' THEN percentage END) AS "01",
    MAX(CASE WHEN month_differnce = '02' THEN percentage END) AS "02",
    MAX(CASE WHEN month_differnce = '03' THEN percentage END) AS "03",
    MAX(CASE WHEN month_differnce = '04' THEN percentage END) AS "04",
    MAX(CASE WHEN month_differnce = '05' THEN percentage END) AS "05",
    MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
FROM my_table 
GROUP BY cohort_month, cohort_size; 

The CASE expression could be swapped with IFF function:

MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
<=>
MAX(IFF(month_differnce = '06', percentage, NULL)) AS "06"
  • Related