Home > Software engineering >  Oracle finding duplicate rows that are consecutive for a given field
Oracle finding duplicate rows that are consecutive for a given field

Time:06-25

We have some tables in an Oracle 19c database that were put in place many years ago for audit purposes. This 'audit' table is being generated by a trigger on update/delete to another table "widget". It is not checking to see if the new data matches the old data, so there are a lot of rows that are identical in everything but timestamp. I am not looking for comments on fixing the auditing as there many solutions to that mess. We are currently just stopping the bleeding until a better solution can be implemented.

My question is how to cut my existing audit data down to only the records that actually changed. These tables have grown to the point that it is impacting some of our processes.

auditWidget table data:

rowID widgetID widgetDesc timestampOfTableUpdate
1 1 A 1/29/2016 12:12:55 PM
2 2 test 1/29/2016 12:13:55 PM
3 1 A 1/29/2016 12:14:55 PM
4 1 A 1/29/2016 12:15:55 PM
5 2 test 1/29/2016 12:16:55 PM
6 1 c 1/29/2016 12:17:55 PM
7 1 b 1/29/2016 12:18:55 PM
8 2 d 1/29/2016 12:19:55 PM

As you can see, nothing has changed between row 1,3 and 4, or between 2 and 5. These rows should never have been created. I am trying to find a way to identify/delete the entries that don't show any change in the widget table. My final data would look like this:

rowID widgetID widgetDesc timestampOfTableUpdate
1 1 A 1/29/2016 12:12:55 PM
2 2 b 1/29/2016 12:13:55 PM
6 1 c 1/29/2016 12:17:55 PM
7 1 b 1/29/2016 12:18:55 PM
8 2 d 1/29/2016 12:19:55 PM

LAG only looks at the previous row which might contain a different widgetID. There are thousands of widgetIDs. There is also potentially many identical rows per widgetID in a row. Are there any solutions out there short of looping through the data and conditionally doing something with it row by row? Any direction would be appreciated.

CodePudding user response:

Delete duplicates, then:

delete from auditwidget a
  where a.rowid > (select min(b.rowid)
                   from auditwidget b
                   where b.widgetid = a.widgetid
                     and b.widgetdesc = a.widgetdesc
                  );

CodePudding user response:

The simple solution to only SELECT the relevant data is to use LAG(). For example:

select *,
from (
  select t.*,
    lag(widgetdesc) over(
      partition by widgetid 
      order by timestampOfTableUpdate
    ) as prev_desc
  from my_table t
) x
where widgetdesc <> prev_desc
  • Related