I'm working on creating a monthly cohort report and my final table looks like below
|-----------------|-------------------|------------------------|----------------|
| cohort_month | cohort_size | month_differnce | percentage |
|-----------------|-------------------|------------------------|----------------|
| Dec 2021 | 2981 | 00 | 94.919 |
| Dec 2021 | 2981 | 01 | 74.829 |
| Dec 2021 | 2981 | 02 | 81.915 |
| Dec 2021 | 2981 | 03 | 64.726 |
| Dec 2021 | 2981 | 04 | 12.850 |
| Dec 2021 | 2981 | 05 | 87.831 |
| Dec 2021 | 2981 | 06 | 91.717 |
| Jan 2022 | 3949 | 00 | 51.827 |
| Jan 2022 | 3949 | 01 | 73.378 |
| Jan 2022 | 3949 | 02 | 17.391 |
| Jan 2022 | 3949 | 03 | 29.039 |
| Jan 2022 | 3949 | 04 | 03.176 |
| Jan 2022 | 3949 | 05 | 17.182 |
| Jan 2022 | 3949 | 06 | 92.193 |
| Feb 2022 | 7189 | 00 | 95.361 |
| Feb 2022 | 7189 | 01 | 28.361 |
| Feb 2022 | 7189 | 02 | 10.178 |
| Feb 2022 | 7189 | 03 | 82.048 |
| Feb 2022 | 7189 | 04 | 73.391 |
| Feb 2022 | 7189 | 05 | 91.361 |
| Feb 2022 | 7189 | 00 | 81.371 |
|-----------------|-------------------|------------------------|----------------|
I want to convert the above table to
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
| cohort_month | cohort_size | 00 | 01 | 02 | 03 | 04 | 05 | 06 |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
| Dec 2021 | 2981 | 94.919 | 74.829 | 81.915 | 64.726 | 12.850 | 87.831 | 91.717 |
| Jan 2022 | 3949 | 51.827 | 73.378 | 17.391 | 29.039 | 03.176 | 17.182 | 92.193 |
| Feb 2022 | 7189 | 95.361 | 28.361 | 10.178 | 82.048 | 73.391 | 91.361 | 81.371 |
|-----------------|-------------------|---------|---------|---------|---------|---------|---------|---------|
My Attempt:
select * from my_table
pivot (sum(cohort_size) for month_difference in (0,1,2,3,4,5,6)) as pivot_table
order by 1;
But this is giving very different result
CodePudding user response:
You were almost there. I will let you address the sorting since alpha sort ≠ calendar sort.
WITH MY_TABLE (COHORT_MONTH, COHORT_SIZE, MONTH_DIFFERENCE, PERCENTAGE) AS
(SELECT * FROM VALUES
('Dec 2021',2981,00,94.919)
,('Dec 2021',2981,01,74.829)
,('Dec 2021',2981,02,81.915)
,('Dec 2021',2981,03,64.726)
,('Dec 2021',2981,04,12.850)
,('Dec 2021',2981,05,87.831)
,('Dec 2021',2981,06,91.717)
,('Jan 2022',3949,00,51.827)
,('Jan 2022',3949,01,73.378)
,('Jan 2022',3949,02,17.391)
,('Jan 2022',3949,03,29.039)
,('Jan 2022',3949,04,03.176)
,('Jan 2022',3949,05,17.182)
,('Jan 2022',3949,06,92.193)
,('Feb 2022',7189,00,95.361)
,('Feb 2022',7189,01,28.361)
,('Feb 2022',7189,02,10.178)
,('Feb 2022',7189,03,82.048)
,('Feb 2022',7189,04,73.391)
,('Feb 2022',7189,05,91.361)
,('Feb 2022',7189,06,81.371)
)
SELECT *
FROM MY_TABLE
PIVOT(MAX(PERCENTAGE) FOR MONTH_DIFFERENCE IN (0,1,2,3,4,5,6)) AS P
ORDER BY 1;
COHORT_MONTH | COHORT_SIZE | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|---|
Dec 2021 | 2981 | 94.919 | 74.829 | 81.915 | 64.726 | 12.850 | 87.831 | 91.717 |
Feb 2022 | 7189 | 95.361 | 28.361 | 10.178 | 82.048 | 73.391 | 91.361 | 81.371 |
Jan 2022 | 3949 | 51.827 | 73.378 | 17.391 | 29.039 | 3.176 | 17.182 | 92.193 |
CodePudding user response:
Using Conditional Aggregation as per comment:
SELECT cohort_month, cohort_size,
MAX(CASE WHEN month_differnce = '00' THEN percentage END) AS "00",
MAX(CASE WHEN month_differnce = '01' THEN percentage END) AS "01",
MAX(CASE WHEN month_differnce = '02' THEN percentage END) AS "02",
MAX(CASE WHEN month_differnce = '03' THEN percentage END) AS "03",
MAX(CASE WHEN month_differnce = '04' THEN percentage END) AS "04",
MAX(CASE WHEN month_differnce = '05' THEN percentage END) AS "05",
MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
FROM my_table
GROUP BY cohort_month, cohort_size;
The CASE expression could be swapped with IFF function:
MAX(CASE WHEN month_differnce = '06' THEN percentage END) AS "06"
<=>
MAX(IFF(month_differnce = '06', percentage, NULL)) AS "06"