Home > database >  Delete duplicate data from MySQL table date wise
Delete duplicate data from MySQL table date wise

Time:06-16

How would I delete yesterday's duplicate data from a MySQL Table and keep latest record.

For example, with the following data:

SELECT * FROM data;

| pid | serviceid | id             | created
| ----| ----------|----------------|---------
| 10  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 11  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 12  | KSPSLS    | QLSPDF0S0SDFKK | 2022-06-15 11:44:21
| 13  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42
| 14  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

I used

DELETE n1 FROM data n1, data n2 WHERE n1.pid < n2.pid AND n1.id = n2.id

How could I DELETE to only remove duplicates and keep latest pid for yesterday's data.

Expected output would be

SELECT * FROM data;

| pid | serviceid | id             | created
| ----| ----------|----------------|---------
| 10  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 11  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 12  | KSPSLS    | QLSPDF0S0SDFKK | 2022-06-15 11:44:21
| 14  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

remove | 13 | AKNKSL | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

CodePudding user response:

If you want to remove the duplicate data you can use this

SELECT DISTINCT * FROM data;

but if you want to choose the latest record from both of it you can use this

SELECT * FROM data WHERE [data] = (SELECT MAX([data.created]) FROM data) GO

CodePudding user response:

DELETE * FROM data WHERE id IN (SELECT id FROM data WHERE name="ABNCKN" LIMIT((SELECT COUNT(name) FROM data WHERE name="ABNCKN") - 1)));

SELECT id FROM data WHERE name="ABNCKN" >> this query will get all ids with specific name, and limit will limit the the number of rows returned. The number of rows will be dynamic based on the duplicated rows except one row. and you can add ORDER BY to the query to arrange them.

and finally delete all duplicated rows except one row.

I hope it works well

CodePudding user response:

Didn't understand very well why you keep some duplicated data in the expected output but what you can try is get the last Id with a JOIN and filter it from the WHERE so is not deleted:

DELETE d1 FROM data d1, data d2
JOIN (SELECT MAX(pid) pid FROM data) d3
WHERE d1.pid != d3.pid AND d1.pid < d2.pid AND d1.id = d2.id;

CodePudding user response:

You can achieve this using self join:

DELETE t1 FROM data t1 
  INNER JOIN data t2 
  WHERE t1.pid < t2.pid 
    AND t1.serviceid = t2.serviceid 
    AND t1.id = t2.id
    AND t1.created = t2.created;
    
SELECT * FROM data;

See this db<>fiddle

For MySQL 8 you can also achieve this using ROW_NUMBER():

WITH CTE AS
(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY serviceid,id,created ORDER BY pid DESC) AS RN
  FROM data
)
DELETE data
FROM data
JOIN CTE ON data.pid=CTE.pid
WHERE CTE.RN>1;

SELECT * FROM data;

Check this db<>fiddle

Output after delete:

pid serviceid id created
11 ABNCKN Q0CHZ2GI1VKOPL 2022-06-14 10:49:34
12 KSPSLS QLSPDF0S0SDFKK 2022-06-15 11:44:21
14 AKNKSL QLSPDF0S0SDFKK 2022-06-15 12:51:42

As Nishant mentioned pid 10 should also be removed.

  • Related