PostgreSQL: Exclude null value from comparison across columns-CodePudding

Imagine this table:

id   col1    col2    col3    col4
1    A       A       C
2            B       B       B
3    D               D

I would like to add a column that tells me if all not-null values of the row match.

So the ideal output is:

id   col1    col2    col3    col4   is_a_match
1    A       A       C              FALSE
2            B       B       B      TRUE
3    D               D              TRUE

I have tried:

select *,
     case  
         when col1 = col2
                and col2 = col3
                and col3 = col4
         then 'TRUE'
         else 'FALSE'
     end as is_a_match
from my_table

But would return false for all due to the null values.

What is the best way to achieve the output above?

CodePudding user response：

You could transform the columns to rows, the count the distinct values. That will automatically ignore NULL values:

select t.*,
       (select count(distinct x.col)
        from (
          values (t.col1), 
                 (t.col2), 
                 (t.col3), 
                 (t.col4)
        ) as x(col)
        where x.col is not null) = 1 as is_a_match
from the_table t

If you don't want to list all columns manually, you can use some JSON magic to turn the columns to rows in order to count the distinct values:

select t.*, 
       (select count(distinct x.val)
        from jsonb_each_text(to_jsonb(t) - 'id') as x(col, val)
        where x.val is not null) = 1
from the_table t

CodePudding user response：

Simple boolean logic - looks a bit awkward, but I bet it's the fastest way:

SELECT *
     , COALESCE(NOT (col1 <> col2 OR col1 <> col3 OR col1 <> col4
                  OR col2 <> col3 OR col2 <> col4
                  OR col3 <> col4), true) AS is_a_match
FROM   tbl;

Any non-null, non-equal pair makes the core expression true. Invert with NOT, default to true with COALESCE.

Or closer to your original attempt, which wasn't far off:

SELECT *
     , CASE WHEN (col1 <> col2 OR col1 <> col3 OR col1 <> col4
               OR col2 <> col3 OR col2 <> col4
               OR col3 <> col4)
            THEN false
            ELSE true END is_a_match
FROM   tbl;

db<>fiddle here

Either way, the point is to invert the logic: To ignore comparison to NULL values, look for mismatches (which is only true for two not-null values), not for matches.

The code bloats with more columns. But for a hand full of columns, walking through all pairs is fastest.

CodePudding user response：

If the number of columns is variable, flipping into and out of jsonb could save you some cutting and pasting:

with pivots as (
  select d.id, c.k, c.v
    from de_data d
         cross join lateral jsonb_each_text(to_jsonb(d)) as c(k,v)
   where c.k != 'id'
     and c.v is not null
), match_check as (
  select id, count(distinct v) = 1 as is_a_match 
    from pivots 
   group by id
)
select d.*, mc.is_a_match
  from de_data d
       left join match_check mc on mc.id = d.id;

db<>fiddle here