I have a table named as "source_table"
| Date | Country | Subscribers |
----------------------------------------
| 2021-10-01 | USA | 5 |
| 2021-10-12 | Canada | 10 |
| 2021-10-23 | USA | 15 |
--
| 2021-11-01 | USA | 10 |
| 2021-11-05 | Canada | 20 |
I want to convert this into wide format
| Country | Oct-21 | Nov-21 | Dec-21 |
--------------------------------------
| USA | 53 | 68 | 12 |
| Canada | 35 | 86 | 21 |
I tried the following code:
SELECT country,
CASE WHEN date BETWEEN '2021-10-01' AND '2021-10-31' THEN SUM(subscribers)
END AS "Oct-21",
CASE WHEN date BETWEEN '2021-11-01' AND '2021-11-30' THEN SUM(subscribers)
END AS "Nov-21",
CASE WHEN date BETWEEN '2021-12-01' AND '2021-12-31' THEN SUM(subscribers)
END AS "Dec-21"
FROM
source_table
GROUP BY
country
But the above code throws an error saying
column "source_table.date" must appear in the GROUP BY clause or be used in an aggregate function
I'm not sure why the column "date" is needed in GROUP BY; if I do use it in GROUP BY, I have the country repeating multiple times and that's not the desired output. The country name should appear only once with the the number of subscribers in that month aggregated and there should not be any NULLs in the columns (there's at least one subscriber on every day for each country in the source_table).
Am I missing something here? Please help. I'm using SQL on Redshift.
CodePudding user response:
You want to sum or count the CASE
expressions, e.g.
SELECT
country,
SUM(CASE WHEN date BETWEEN '2021-10-01' AND '2021-10-31'
THEN subscribers ELSE 0 END) AS "Oct-21",
SUM(CASE WHEN date BETWEEN '2021-11-01' AND '2021-11-30'
THEN subscribers ELSE 0 END) AS "Nov-21",
SUM(CASE WHEN date BETWEEN '2021-12-01' AND '2021-12-31'
THEN subscribers ELSE 0 END) AS "Dec-21"
FROM source_table
GROUP BY country;