Imagine this table:
id col1 col2 col3 col4
1 A A C
2 B B B
3 D D
I would like to add a column that tells me if all not-null values of the row match.
So the ideal output is:
id col1 col2 col3 col4 is_a_match
1 A A C FALSE
2 B B B TRUE
3 D D TRUE
I have tried:
select *,
case
when col1 = col2
and col2 = col3
and col3 = col4
then 'TRUE'
else 'FALSE'
end as is_a_match
from my_table
But would return false
for all due to the null values.
What is the best way to achieve the output above?
CodePudding user response:
You could transform the columns to rows, the count the distinct values. That will automatically ignore NULL values:
select t.*,
(select count(distinct x.col)
from (
values (t.col1),
(t.col2),
(t.col3),
(t.col4)
) as x(col)
where x.col is not null) = 1 as is_a_match
from the_table t
If you don't want to list all columns manually, you can use some JSON magic to turn the columns to rows in order to count the distinct values:
select t.*,
(select count(distinct x.val)
from jsonb_each_text(to_jsonb(t) - 'id') as x(col, val)
where x.val is not null) = 1
from the_table t
CodePudding user response:
If the number of columns is variable, flipping into and out of jsonb could save you some cutting and pasting:
with pivots as (
select d.id, c.k, c.v
from de_data d
cross join lateral jsonb_each_text(to_jsonb(d)) as c(k,v)
where c.k != 'id'
and c.v is not null
), match_check as (
select id, count(distinct v) = 1 as is_a_match
from pivots
group by id
)
select d.*, mc.is_a_match
from de_data d
left join match_check mc on mc.id = d.id;
db<>fiddle here
CodePudding user response:
Simple boolean logic - looks a bit awkward, but I bet it's the fastest way:
SELECT *
, COALESCE(NOT (col1 <> col2 OR col1 <> col3 OR col1 <> col4
OR col2 <> col3 OR col2 <> col4
OR col3 <> col4), true) AS is_a_match
FROM tbl;
Any non-null, non-equal pair makes the core expression true
. Invert with NOT
, default to true with COALESCE
.
Or closer to your original attempt, which wasn't far off:
SELECT *
, CASE WHEN (col1 <> col2 OR col1 <> col3 OR col1 <> col4
OR col2 <> col3 OR col2 <> col4
OR col3 <> col4)
THEN false
ELSE true END is_a_match
FROM tbl;
db<>fiddle here
Either way, the point is to invert the logic: To ignore comparison to NULL values, look for mismatches (which is only true
for two not-null values), not for matches.
The code bloats with more columns. But for a hand full of columns, walking through all pairs is fastest.