Home > OS >  How to separate column values by condition (pivot) to fill one row
How to separate column values by condition (pivot) to fill one row

Time:03-08

I have two tables that I'd like do a full outer join where the resulting view separates the values table into two separate columns with one row for each name_id. I have made one approach with a CASE expression to select by type and then use it with pandas to fill in the values and return distinct name_ids.

Name Table

name_id name
1 foo
2 bar
3 doo
4 sue

Values Table

name_id value type
1 90 red
2 95 blue
3 33 red
3 35 blue
4 60 blue
4 20 red

This is a condensed version. In my full table, I need to do this twice with two separate value tables sorted by type, red/blue and control/placebo.

Simple Join

SELECT names_table.name_id, name, value, type
FULL OUTER JOIN values_table
ON names_table.name_id = values_table.name_id
WHERE type in ('red', 'blue')
name_id name value type
1 foo 90 red
2 bar 95 blue
3 doo 33 red
3 doo 35 blue
4 sue 60 blue
4 sue 20 red

Current work around result which I then fix with python and pandas

SELECT names_table.name_id, name, value, type
CASE 
    WHEN type = 'red' THEN value END red,
CASE 
    WHEN type = 'blue' THEN value END blue
FROM names_table
FULL OUTER JOIN values_table
ON names_table.name_id = values_table.name_id
name_id name blue red
1 foo Null 90
2 bar 95 Null
3 doo 35 Null
3 doo Null 33
4 sue 60 Null
4 sue Null 20

This is my desired output below, where I would have the types as columns and just rows for unique name_ids but with value tables 1 and 2.

Desired Output

name_id name blue red
1 foo Null 90
2 bar 95 Null
3 doo 35 33
4 sue 60 20

CodePudding user response:

I have two tables that I'd like do a full outer join ...

Why would you? Better explain what you actually want to do instead of the assumed tool to implement it.

Simple pivoting with the aggregate FILTER clause. See:

SELECT name_id, n.name, v.blue, v.red
FROM  (
   SELECT name_id
        , min(value) FILTER (WHERE type = 'blue') AS blue
        , min(value) FILTER (WHERE type = 'red')  AS red
   FROM   values_table
   GROUP  BY 1
   ) v
LEFT   JOIN names_table n USING (name_id);

Produces your desired result.

db<>fiddle here

The LEFT JOIN includes result rows even if no name is found. A FULL [OUTER] JOIN would add names in the result that have no values at all. I think you really want a LEFT [OUTER] JOIN or even a plain [INNER] JOIN.

You can just switch the JOIN type to adapt to your actual requirements. The identical column name "name_id" allows to join with a USING clause. The unqualified name_id in the outer SELECT works for any join type.

Note how I aggregate first and join later. Typically substantially faster. See:

If there can be duplicate values for "red" or "blue", you'll have to define how to deal with those.

For more involved queries consider crosstab(). See:

  • Related