I'm pivoting in PostgreSQL but when I run the query the output says:
ERROR: return and sql tuple descriptions are incompatible SQL state: 42601
Summarizing, I want the distribution channel on raw, the year in the columns and the operative margin as value.
- dist_chann_id --> integer
- year --> year
- operative_margin --> integer
Without the pivot the output is:
dist_chann_name | year | operative_margin |
---|---|---|
1 | 2020 | 20783 |
1 | 2021 | 5791 |
2 | 2020 | 30362 |
3 | 2021 | 14501 |
3 | 2020 | 2765 |
3 | 2021 | 4535 |
This is my query:
SELECT *
FROM crosstab(
'SELECT dist_chann_id, year, operative_margin
FROM marginality_by_channel
ORDER BY dist_chann_id, year'
) AS ct ("DC" int, "2020" int, "2021" int);
CodePudding user response:
Using filtered aggregation is typically much easier than the somewhat convoluted crosstab()
function (at least in my opinion).
select dist_chann_name as dc,
sum(operative_margin) filter (where year = 2020) as "2020",
sum(operative_margin) filter (where year = 2021) as "2021"
from marginality_by_channel
group by dist_chann_name
order by dist_chann_name;
CodePudding user response:
Source of the error msg
One of the columns does not have data type you think it has. Must be operative_margin
, probably text
?
The 1-parameter form of crosstab()
only uses the "category" column (year
in your example) only for sorting. And the "row_name" column (dist_chann_name
- or dist_chann_id
?) would produce a different error msg.
Solution
Either way, unless you can guarantee that every "row_name" has exactly two values to it, it's safer to use the 2-parameter form of corosstab()
:
SELECT *
FROM crosstab(
$$
SELECT dist_chann_name, year, operative_margin
FROM marginality_by_channel
ORDER BY 1, 2
$$
, 'VALUES (2020), (2021)'
) AS ct ("DC" int, "2020" int, "2021" int);
db<>fiddle here
This variant also happens to be more tolerant with type mismatches (as everything is passed as text
anyway). See:
crosstab()
shines for many resulting value columns (faster, shorter). For just two "value" columns, aggregate FILTER
might be the better (simpler) choice. Not much performance to gain (if any, after adding some overhead). See:
- a_horse's answer under this question
- Conditional SQL count
Broken setup
That aside, your setup is ambiguous to begin with. It includes two rows for the same (dist_chann_name, year) = (3, 2021)
.
- a_horse uses
sum()
in his aggregateFILTER
solution. You might also usemin()
ormax()
, or whatever ... - My solution with the 2-parameter form outputs the last value according to sort order. (Think of it as each next value overwriting it's dedicated spot.)
- The 1-parameter form outputs the first value according to sort order. (Think of it as "first come, first serve". Superfluous rows are discarded.)
A clean solution would use an explicit sort order and document the effect, or work with a query producing distinct values, or use the appropriate aggregate function with the FILTER
solution.