Let's say we have following table:
Select
'a' as col1,
'b' as col2
union
Select
'a' as col1,
'c' as col2
union
Select
'b' as col1,
'a' as col2
It gives:
col1 col2
"a" "b"
"a" "c"
"b" "a"
How would you define duplicated row assuming that rows like ("a", "b") and ("b", "a") are the same.
So in fact result should be:
col1 col2
"a" "b"
"a" "c"
CodePudding user response:
judging by your sample data and results you could use a query like below
select
distinct
case when t2.col1 is not null and t2.col1<t1.col1 then t2.col1 else t1.col1 end as col1,
case when t2.col2 is not null and t2.col2>t1.col2 then t2.col2 else t1.col2 end as col2
from yourtable t1
left join yourtable t2
on t1.col1=t2.col2 and t1.col2=t2.col1
CodePudding user response:
In Databricks, I can do something as follows and it works. It seems you need to have id to select distinct records by ignoring field order.
CREATE TABLE table_a AS
SELECT uuid() as id, col1, col2 FROM your_table;
SELECT a1.col1, a2.col2
FROM table_a a1
WHERE NOT EXISTS (SELECT 1
FROM table_a a2
WHERE a1.col1 = a2.col2
AND a1.col2 = a2.col1
AND a1.id > a2.id);