Home > Blockchain >  PostgreSQL: Exclude null value from comparison across columns
PostgreSQL: Exclude null value from comparison across columns

Time:06-17

Imagine this table:

id   col1    col2    col3    col4
1    A       A       C
2            B       B       B
3    D               D

I would like to add a column that tells me if all not-null values of the row match.

So the ideal output is:

id   col1    col2    col3    col4   is_a_match
1    A       A       C              FALSE
2            B       B       B      TRUE
3    D               D              TRUE

I have tried:

select *,
     case  
         when col1 = col2
                and col2 = col3
                and col3 = col4
         then 'TRUE'
         else 'FALSE'
     end as is_a_match
from my_table

But would return false for all due to the null values.

What is the best way to achieve the output above?

CodePudding user response:

You could transform the columns to rows, the count the distinct values. That will automatically ignore NULL values:

select t.*,
       (select count(distinct x.col)
        from (
          values (t.col1), 
                 (t.col2), 
                 (t.col3), 
                 (t.col4)
        ) as x(col)
        where x.col is not null) = 1 as is_a_match
from the_table t

If you don't want to list all columns manually, you can use some JSON magic to turn the columns to rows in order to count the distinct values:

select t.*, 
       (select count(distinct x.val)
        from jsonb_each_text(to_jsonb(t) - 'id') as x(col, val)
        where x.val is not null) = 1
from the_table t

CodePudding user response:

Simple boolean logic - looks a bit awkward, but I bet it's the fastest way:

SELECT *
     , COALESCE(NOT (col1 <> col2 OR col1 <> col3 OR col1 <> col4
                  OR col2 <> col3 OR col2 <> col4
                  OR col3 <> col4), true) AS is_a_match
FROM   tbl;

Any non-null, non-equal pair makes the core expression true. Invert with NOT, default to true with COALESCE.

Or closer to your original attempt, which wasn't far off:

SELECT *
     , CASE WHEN (col1 <> col2 OR col1 <> col3 OR col1 <> col4
               OR col2 <> col3 OR col2 <> col4
               OR col3 <> col4)
            THEN false
            ELSE true END is_a_match
FROM   tbl;

db<>fiddle here

Either way, the point is to invert the logic: To ignore comparison to NULL values, look for mismatches (which is only true for two not-null values), not for matches.

The code bloats with more columns. But for a hand full of columns, walking through all pairs is fastest.

CodePudding user response:

If the number of columns is variable, flipping into and out of jsonb could save you some cutting and pasting:

with pivots as (
  select d.id, c.k, c.v
    from de_data d
         cross join lateral jsonb_each_text(to_jsonb(d)) as c(k,v)
   where c.k != 'id'
     and c.v is not null
), match_check as (
  select id, count(distinct v) = 1 as is_a_match 
    from pivots 
   group by id
)
select d.*, mc.is_a_match
  from de_data d
       left join match_check mc on mc.id = d.id;

db<>fiddle here

  • Related