In the following data:
id_1 | id_2 | colA | colB | colC |
---|---|---|---|---|
1 | 2 | 2022-01-02 scroll | fast | 12 |
1 | 2 | 2022-01-02 scroll | fast | 12 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
I would like to remove duplicates only when they occur in pair. The output table should have one of the rows from the first two rows and remaining rows should be left as is.
The output table would be
id_1 | id_2 | colA | colB | colC |
---|---|---|---|---|
1 | 2 | 2022-01-02 scroll | fast | 12 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
1 | 3 | 2022-01-03 scroll | fast fast | 11 |
I have large dataset, so I cannot filter based on Id_1 and id_2. I am looking for a generic solution.
Test data is here at Sqlfiddle.
CodePudding user response:
We can use a combination of COUNT()
and ROW_NUMBER()
, both as analytic functions:
WITH cte AS (
SELECT t.*, COUNT(*) OVER (PARTITION BY id_1, id_2) cnt,
ROW_NUMBER() OVER (PARTITION BY id_1, id_2 ORDER BY id_1) rn
FROM yourTable t
)
SELECT id_1, id_2, colA, colB, colC
FROM cte
WHERE cnt <> 2 OR rn = 1;
The above logic returns rows appearing only once or appearing 3 times or more. In the case of pair duplicates, it arbitrarily returns one of them.
CodePudding user response:
You can use a MERGE
statement and count the size of each group using analytic functions and then delete the second row of a group when there are only 2 rows per group:
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
ROW_NUMBER() OVER (PARTITION BY id_1, id_2, colA, colB, colC ORDER BY ROWNUM)
AS rn,
COUNT(*) OVER (PARTITION BY id_1, id_2, colA, colB, colC) AS cnt
FROM table_name
) src
ON (dst.ROWID = src.rid AND src.rn = 2 AND src.cnt = 2)
WHEN MATCHED THEN
UPDATE SET colC = colC
DELETE WHERE 1 = 1;
Which, for the sample data:
create table table_name (id_1, id_2, colA, colB, colC) AS
SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL;
Then after the MERGE
the table contains:
ID_1 | ID_2 | COLA | COLB | COLC |
---|---|---|---|---|
1 | 2 | A | B | C |
1 | 3 | D | E | F |
1 | 3 | D | E | F |
1 | 3 | D | E | F |