Home > Mobile >  Delete rows where date is not minimum
Delete rows where date is not minimum

Time:12-24

I have a table (structure below) that I need to clean up by deleting rows for each Object_ID:

WHERE Current_Step is NULL and Change = 'change' 
and Date_of_Change <> MIN(Date_of_Change)

That is, I need to leave only the row with minimum date for each Object_ID.

Table sample

Object_ID Current_Step Change Date_of_Change
0025307 NULL change 16.11.2021
0025307 NULL change 19.11.2021
0025307 NULL change 19.11.2021

I am using MS SQL. There are no primary keys. All columns are VARCHAR except Date_of_Change being of type DATE.

The reason why I need to clean up this table is because it was incorrectly filled because source query was checking for IF NULL = NULL and then marked those status changes as changed even though they did not change. So I need to revert values back to original date they were changed because if they still have value NULL that means there were no actual changes happening in status.

Desired behavior

My attempt in identifying rows that I need to keep:

SELECT [Object_ID]
      ,MIN([Date_of_Change])
FROM table
WHERE [Current_Step] IS NULL 
AND [Change] = 'change'
GROUP BY Object_ID

I just need to remove other rows with the same Object_ID whose Date_of_Change is not equal to the one identified in query above.

CodePudding user response:

do join on same table like i did on 'Table1223' below.

Example:

DELETE tbl 
FROM Table1223 tbl
JOIN (SELECT * FROM Table1223) objID
    ON objID.Object_ID = tbl.Object_ID
WHERE tbl.Date_of_Change > objID.Date_of_Change

CodePudding user response:

Schema

So you have a table with versioned objects which holds change records associated to the object with some details and a date.

Now you want to select

  • the first change per object
  • the oldest (within the GROUP of this object's changes)
  • using MIN function on a DATE column-type

This oldest should be retained/kept and stay. All other object change-versions should be deleted.

Solving

A. Selecting the fist/oldest changes per object in 2 steps.

  1. Select the MIN(date) per object:
SELECT Object_ID, COUNT(Object_ID) AS Count_Changes, MIN(Date_of_Change) AS First_Change
FROM table
GROUP BY Object_ID

Resultset contains each object with the total count of changes and the date of the first change.

  1. Select the first changes using previous result as subquery in a JOIN:
SELECT *
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN ( 
  SELECT Object_ID, MIN(Date_of_Change) AS First_Change
  FROM table
  WHERE Current_Step is NULL and Change = 'change'
  GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
WHERE Current_Step is NULL and Change = 'change'

This are the rows to keep and not remove. The first change of each object should be retained and not cleaned.

B. Now we can invert the JOIN-condition to get all the rows, that we want to delete/clean:

  1. Change the date-comparison

) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change

to not-equal:

) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change

  1. Run a dry-select first, to get at least the count before deleting.
SELECT COUNT(Object_ID) AS records_to_remove
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN ( 
  SELECT Object_ID, MIN(Date_of_Change) AS First_Change
  FROM table
  WHERE Current_Step is NULL and Change = 'change'
  GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE Current_Step is NULL and Change = 'change'
  1. Prepare the DELETE statement with JOIN (if supported by DBMS):
DELETE FROM table t
JOIN ( 
  SELECT Object_ID, MIN(Date_of_Change) AS First_Change
  FROM table
  WHERE Current_Step is NULL and Change = 'change'
  GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE t.Current_Step is NULL AND t.Change = 'change'

Alternative to JOIN try USING on other DBMS

Some DBMS do not support JOIN in DELETE statements, but alternatives like USING:

DELETE FROM table t
USING ( 
  SELECT Object_ID, MIN(Date_of_Change) AS First_Change
  FROM table t2
  WHERE t2.Current_Step is NULL AND t2.Change = 'change'
) AS m
WHERE ...
AND t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
  • Related