Please consider following table
Table Name: mytable
model_id | event_name | time_of_event |
---|---|---|
9 | CREATE | 2016-01-01 00:00:00 |
9 | UPDATE | 2016-01-01 01:00:00 |
9 | DELETE | 2016-01-01 02:00:00 |
3 | CREATE | 2016-01-01 03:00:00 DUPLICATE |
3 | CREATE | 2016-01-01 03:00:00 DUPLICATE delete this |
3 | DELETE | 2016-01-01 04:00:00 |
How to delete 5th entry from above table i.e. delete row with exactly same value from table. In above example no column is unique.
Please keep in mind that database could be huge and I don't want to recreate or republish data with distinct values into the table.
// Use below code to create above example
CREATE TABLE mytable(
model_id integer,
event_name varchar(7),
time_of_event timestamp
);
INSERT INTO mytable
(model_id, event_name, time_of_event)
VALUES
(9, 'CREATE', '2016-01-01 00:00:00'),
(9, 'UPDATE', '2016-01-01 01:00:00'),
(9, 'DELETE', '2016-01-01 02:00:00'),
(3, 'CREATE', '2016-01-01 03:00:00'),
(3, 'CREATE', '2016-01-01 03:00:00'),
(3, 'DELETE', '2016-01-01 04:00:00');
SELECT * FROM mytable;
CodePudding user response:
Having two or more rows with identical values is a sign of very bad design. I suppose model_id
is the table's primary key. I wonder how did you end up in this situation.
I don't want to recreate or republish data with distinct values into the table.
One possible solution is to add (not recreate/republish) a column with unique values to your table, then delete the duplicate rows you want.
ALTER TABLE mytable ADD COLUMN MyTableID INT;
You need to fill this column with unique values:
SET @i := 0;
UPDATE MyTable SET MyTableID = @i:=(@i 1) WHERE 1=1;
Next, you can write the following query:
SELECT
MT.MyTableID, MT.model_id, MT.event_name, MT.time_of_event
FROM
MyTable MT,
(SELECT model_id, event_name, time_of_event, COUNT(*)
FROM MyTable
GROUP BY model_id, event_name, time_of_event
HAVING COUNT(*) > 1
) TmpTable
WHERE
MT.model_id = TmpTable.model_id
AND MT.event_name = TmpTable.event_name
AND MT.time_of_event = TmpTable.time_of_event
;
Result:
MyTableID | model_id | event_name | time_of_event |
---|---|---|---|
4 | 3 | CREATE | 2016-01-01 03:00:00 |
5 | 3 | CREATE | 2016-01-01 03:00:00 |
You can now proceed with the deletion of duplicate rows:
DELETE FROM MyTable WHERE MyTableID IN (5 /*, the ones you wish */);
If you have too many duplicate values and you can't afford to delete them manually, you can do it like this:
DELETE FROM MyTable WHERE MyTableID IN (/* The previous long query */) LIMIT /* The number of rows of the previous long query - 1 */;
The -1
is to preserve one row of the duplicates. If you want to delete them all, remove the LIMIT
clause.
CodePudding user response:
Try with a helper table that contains the duplicates, but only one each:
With this scenario: ...
CREATE TABLE mytable(
model_id integer,event_name VARCHAR(8),time_of_event TIMESTAMP)
;
INSERT INTO mytable
-- your input data ...
SELECT 9,'CREATE',TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 9,'UPDATE',TIMESTAMP '2016-01-01 01:00:00'
UNION ALL SELECT 9,'DELETE',TIMESTAMP '2016-01-01 02:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'DELETE',TIMESTAMP '2016-01-01 04:00:00'
;
Create your helper table like so:
CREATE TABLE helper AS
SELECT
model_id
, event_name
, time_of_event
FROM mytable
GROUP BY
model_id
, event_name
, time_of_event
HAVING COUNT(*) > 1;
Then, use the helper
table to delete ... you will delete all rows, not only one of the duplicates ...
DELETE FROM mytable
WHERE(model_id,event_name,time_of_event) IN (
SELECT model_id,event_name,time_of_event FROM helper
);
And finally, insert all the rows from the helper
table back in again:
INSERT INTO mytable
SELECT * FROM helper;
COMMIT; -- if your connection is not auto-commit ...
But I'd like to add that, for most database systems, the other approach - to create a new table containing SELECT DISTINCT * FROM old_table
is the faster alternative as soon as we are talking about around 20 to 25 % of the total row count.