I have a table (structure below) that I need to clean up by deleting rows for each Object_ID
:
WHERE Current_Step is NULL and Change = 'change'
and Date_of_Change <> MIN(Date_of_Change)
That is, I need to leave only the row with minimum date for each Object_ID
.
Table sample
Object_ID | Current_Step | Change | Date_of_Change |
---|---|---|---|
0025307 | NULL | change | 16.11.2021 |
0025307 | NULL | change | 19.11.2021 |
0025307 | NULL | change | 19.11.2021 |
I am using MS SQL.
There are no primary keys.
All columns are VARCHAR
except Date_of_Change
being of type DATE
.
The reason why I need to clean up this table is because it was incorrectly filled because source query was checking for IF NULL = NULL and then marked those status changes as changed even though they did not change. So I need to revert values back to original date they were changed because if they still have value NULL that means there were no actual changes happening in status.
Desired behavior
My attempt in identifying rows that I need to keep:
SELECT [Object_ID]
,MIN([Date_of_Change])
FROM table
WHERE [Current_Step] IS NULL
AND [Change] = 'change'
GROUP BY Object_ID
I just need to remove other rows with the same Object_ID whose Date_of_Change is not equal to the one identified in query above.
CodePudding user response:
do join on same table like i did on 'Table1223' below.
Example:
DELETE tbl
FROM Table1223 tbl
JOIN (SELECT * FROM Table1223) objID
ON objID.Object_ID = tbl.Object_ID
WHERE tbl.Date_of_Change > objID.Date_of_Change
CodePudding user response:
Schema
So you have a table with versioned objects which holds change records associated to the object with some details and a date.
Now you want to select
- the first change per object
- the oldest (within the GROUP of this object's changes)
- using
MIN
function on aDATE
column-type
This oldest should be retained/kept and stay. All other object change-versions should be deleted.
Solving
A. Selecting the fist/oldest changes per object in 2 steps.
- Select the MIN(date) per object:
SELECT Object_ID, COUNT(Object_ID) AS Count_Changes, MIN(Date_of_Change) AS First_Change
FROM table
GROUP BY Object_ID
Resultset contains each object with the total count of changes and the date of the first change.
- Select the first changes using previous result as subquery in a
JOIN
:
SELECT *
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
WHERE Current_Step is NULL and Change = 'change'
This are the rows to keep and not remove. The first change of each object should be retained and not cleaned.
B. Now we can invert the JOIN-condition to get all the rows, that we want to delete/clean:
- Change the date-comparison
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
to not-equal:
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
- Run a dry-select first, to get at least the count before deleting.
SELECT COUNT(Object_ID) AS records_to_remove
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE Current_Step is NULL and Change = 'change'
- Prepare the DELETE statement with JOIN (if supported by DBMS):
DELETE FROM table t
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE t.Current_Step is NULL AND t.Change = 'change'
Alternative to JOIN try USING
on other DBMS
Some DBMS do not support JOIN in DELETE statements, but alternatives like USING
:
DELETE FROM table t
USING (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table t2
WHERE t2.Current_Step is NULL AND t2.Change = 'change'
) AS m
WHERE ...
AND t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change