How I fixed the incompatible error in PostgreSQL while pivoting?-CodePudding

I'm pivoting in PostgreSQL but when I run the query the output says:

ERROR:  return and sql tuple descriptions are incompatible
SQL state: 42601

Summarizing, I want the distribution channel on raw, the year in the columns and the operative margin as value.

dist_chann_id --> integer
year --> year
operative_margin --> integer

Without the pivot the output is:

dist_chann_name	year	operative_margin
1	2020	20783
1	2021	5791
2	2020	30362
3	2021	14501
3	2020	2765
3	2021	4535

This is my query:

SELECT *
FROM   crosstab(
'SELECT dist_chann_id, year, operative_margin
FROM marginality_by_channel
ORDER BY dist_chann_id, year'
     ) AS ct ("DC" int, "2020" int, "2021" int);

CodePudding user response：

Using filtered aggregation is typically much easier than the somewhat convoluted crosstab() function (at least in my opinion).

select dist_chann_name as dc,
       sum(operative_margin) filter (where year = 2020) as "2020",
       sum(operative_margin) filter (where year = 2021) as "2021"
from marginality_by_channel
group by dist_chann_name
order by dist_chann_name;

CodePudding user response：

Source of the error msg

One of the columns does not have data type you think it has. Must be operative_margin, probably text?

The 1-parameter form of crosstab() only uses the "category" column (year in your example) only for sorting. And the "row_name" column (dist_chann_name - or dist_chann_id ?) would produce a different error msg.

Solution

Either way, unless you can guarantee that every "row_name" has exactly two values to it, it's safer to use the 2-parameter form of corosstab():

SELECT *
FROM  crosstab(
   $$
   SELECT dist_chann_name, year, operative_margin
   FROM   marginality_by_channel
   ORDER  BY 1, 2
   $$
 , 'VALUES (2020), (2021)'
   ) AS ct ("DC" int, "2020" int, "2021" int);

db<>fiddle here

This variant also happens to be more tolerant with type mismatches (as everything is passed as text anyway). See:

PostgreSQL Crosstab Query

crosstab() shines for many resulting value columns (faster, shorter). For just two "value" columns, aggregate FILTER might be the better (simpler) choice. Not much performance to gain (if any, after adding some overhead). See:

a_horse's answer under this question
Conditional SQL count

Broken setup

That aside, your setup is ambiguous to begin with. It includes two rows for the same (dist_chann_name, year) = (3, 2021).

a_horse uses sum() in his aggregate FILTER solution. You might also use min() or max(), or whatever ...
My solution with the 2-parameter form outputs the last value according to sort order. (Think of it as each next value overwriting it's dedicated spot.)
The 1-parameter form outputs the first value according to sort order. (Think of it as "first come, first serve". Superfluous rows are discarded.)

A clean solution would use an explicit sort order and document the effect, or work with a query producing distinct values, or use the appropriate aggregate function with the FILTER solution.