Home > Mobile >  Need help finding duplicate values for Data Quality checks
Need help finding duplicate values for Data Quality checks

Time:09-29

I have a table which requires me to ensure that a combination of attributes should have a unique record against it.

col1 col2 col3
a     b   x
a     b   y
a     c   x
a     d   z
e     b   w

How do I ensure that a col1 col2 combination only has unique col3 values. Here ab has both x and y as col3 values. I have to send such rows to a reject file and I am looking for the right filter query.

CodePudding user response:

We can use an aggregation approach. To identify rows which are failing the unique requirement use:

WITH cte AS (
    SELECT col1, col2
    FROM yourTable
    GROUP BY col1, col2
    HAVING MIN(col3) <> MAX(col3)
)

SELECT t1.*
FROM yourTable t1
INNER JOIN cte t2
    ON t2.col1 = t1.col1 AND
       t2.col2 = t1.col2;
  • Related