Home > Enterprise >  How to get values to dense table with columns of categories using PostgreSQL (crosstab)?
How to get values to dense table with columns of categories using PostgreSQL (crosstab)?

Time:05-20

I have this toy example which gives me sparse table of values separated in their different categories. I would want to have dense matrix, where all columns are individually ordered.

drop table if exists temp_table;
create temp table temp_table(
    rowid int
    , category text
    , score int
    );
insert into temp_table values (0, 'cat1', 10);
insert into temp_table values (1, 'cat2', 21);
insert into temp_table values (2, 'cat3', 32);
insert into temp_table values (3, 'cat2', 23);
insert into temp_table values (4, 'cat2', 24);
insert into temp_table values (5, 'cat3', 35);
insert into temp_table values (6, 'cat1', 16);
insert into temp_table values (7, 'cat1', 17);
insert into temp_table values (8, 'cat2', 28);
insert into temp_table values (9, 'cat2', 29);

Which gives this temporary table:

rowid category score
0 cat1 10
1 cat2 21
2 cat3 32
3 cat2 23
4 cat2 24
5 cat3 35
6 cat1 16
7 cat1 17
8 cat2 28
9 cat2 29

Then ordering score values to different columns based on their category:

select "cat1", "cat2", "cat3"
from crosstab(
    $$ select rowid, category, score from temp_table $$ -- as source_sql
    , $$ select distinct category from temp_table order by category $$ -- as category_sql
 ) as (rowid int, "cat1" int, "cat2" int, "cat3" int)
 

That outputs:

cat1 cat2 cat3
10
21
32
23
24
35
16
17
28
29

But I would want the result of the query to be dense, like:

cat1 cat2 cat3
10 21 32
16 23 35
17 24
28
29

Maybe PostgreSQL's crosstab is not even right tool to do this, but that comes to my mind first as it produces that sparse table close to the result I would need.

CodePudding user response:

This should work for the exact given example data and expected output.

select max(cat1), max(cat2), max(cat3) 
from crosstab(
$$ select rank() over(partition by category order by rowid) as ranking, 
  rowid, 
  category, 
  score 
from temp_table 
order by rowid, category asc$$ -- as source_sql
, $$ select distinct category 
from temp_table 
order by category $$ -- as category_sql
  ) as (ranking int, rowid int, "cat1" int, "cat2" int, "cat3" int) 
group by ranking 
order by ranking asc

You can test the solution here - Ranking over Categories

  1. In the external query, I am effectively picking the max() values of each category, for each of the rankings as obtained in the source SQL query.

CodePudding user response:

with cte as (
  select category, score, row_number() over (
    partition by category order by score
  ) as r
  from temp_table
)
  select
    sum(score) filter (where category = 'cat1') as cat1,
    sum(score) filter (where category = 'cat2') as cat2,
    sum(score) filter (where category = 'cat3') as cat3
  from cte
  group by r
  order by r
;

If the number of columns is known and it is reasonably small, FILTER might be a better option than CROSSTAB, which requires an extension.

  • Related