Home > OS >  Remove duplicate rows in table and update duplicated rows id with minimum ID in another table
Remove duplicate rows in table and update duplicated rows id with minimum ID in another table

Time:01-10

I have a User table to store unique users, unfortunately something went wrong in my code and the same user kept inserted into table when user update profile.

For instance, User John Doe was created multiple times and the ID is 1, 2, 3 and 4. In table Order the same person placed multiple orders and UID refers to the user in table User and UID 1 and 4 both refers to John Doe.

I am trying to

  1. Remove duplicates in User
  2. Keep the minimum UID in Order if duplicate User ID are used in Order. For example both UID 1 and 4 are used in Order, I want to update 4 to 1.

Table User:

 ---- ------ ----- 
| ID | Name | Gender |
 ---- ------ ----- 
| 1  | John Doe | M  |
| 2  | John Doe | M  |
| 3  | John Doe | M  |
| 4  | John Doe | M  |
 ---- ------ ----- 

Table Order:

 ---- ---- ------ ----- 
|ID  | UID | BILLING_ADDR | STATE |
 ---- ---- ------ ----- 
|1   | 1  | XXX | XX  |
|2   | 4  | XXX | XX  |
|3   | 4  | XXX | XX  |
|4   | 4  | XXX | XX  |
 ---- ---- ------ ----- 

To find the duplicates I have:

SELECT Name, count(1) 
FROM User 
GROUP BY Name
HAVING count(1) > 1 
ORDER BY count(1) DESC;

To remove duplicates I have:

DELETE FROM `User` 
  WHERE ID NOT IN (
    SELECT * FROM (
      SELECT MIN(ID) FROM User 
        GROUP BY Name
    ) 
  );

Expected Table User and Order:

 ---- ------ ----- 
| ID | Name | Gender |
 ---- ------ ----- 
| 1  | John Doe | M  |
 ---- ------ ----- 

 ---- ---- ------ ----- 
|ID  | UID | BILLING_ADDR | STATE |
 ---- ---- ------ ----- 
|1   | 1  | XXX | XX  |
|2   | 1  | XXX | XX  |
|3   | 1  | XXX | XX  |
|4   | 1  | XXX | XX  |
 ---- ---- ------ ----- 

Unfortunately I don't know how to update the UID in table Order, can someone help? Thanks!

CodePudding user response:

The whole scenario as I would do it (I use temporary tables in order to not mess up my database, but it works even better with permanent tables):

-- initial scenario, the two tables:
CREATE LOCAL TEMPORARY TABLE
usr(id,"name",gender)
ON COMMIT PRESERVE ROWS AS (
          SELECT 1,'John Doe','M'
UNION ALL SELECT 2,'John Doe','M'
UNION ALL SELECT 3,'John Doe','M'
UNION ALL SELECT 4,'John Doe','M'
)
;
CREATE LOCAL TEMPORARY TABLE
ord(id,uid,billing_addr,state) 
ON COMMIT PRESERVE ROWS AS (
            SELECT 1, 1,'XXX','XX'
  UNION ALL SELECT 2, 4,'XXX','XX'
  UNION ALL SELECT 3, 4,'XXX','XX'
  UNION ALL SELECT 4, 4,'XXX','XX'
)
;

-- de-dupe the usr table using an insert.. .select into yet another table:
CREATE LOCAL TEMPORARY TABLE
usr_dedup 
ON COMMIT PRESERVE ROWS AS
WITH
w_rowcount AS (
  SELECT
    id
  , "name"
  , gender
  , ROW_NUMBER() OVER(PARTITION BY "name" ORDER BY id) AS rn
  FROM usr
)
SELECT
    id
  , "name"
  , gender
FROM w_rowcount
WHERE rn=1;

-- Using the duped and the deduped usr tables to update the old table:
UPDATE ord SET uid = (
  SELECT
    d.id
  FROM (
         usr u 
    JOIN usr_dedup d USING("name")
  )
  WHERE u.id=ord.uid
);
-- "ord" now contains:
SELECT * FROM ord;
 id | uid | billing_addr | state 
---- ----- -------------- -------
  1 |   1 | XXX          | XX
  2 |   1 | XXX          | XX
  3 |   1 | XXX          | XX
  4 |   1 | XXX          | XX

-- now rename the two usr tables to have the usr table with de-duped records:
ALTER TABLE usr RENAME TO usr_dup;
ALTER TABLE
ALTER TABLE usr_dedup RENAME TO usr;
ALTER TABLE
  • Related