Home > Software engineering >  MSSQL - Delete duplicate rows using common column values
MSSQL - Delete duplicate rows using common column values

Time:10-02

I haven't used SQL in quite a while, so I'm a bit lost here. I wanted to check for rows with duplicate values in the "Duration" and "date" columns to remove them from the query results. I would need to keep the rows where column = "Transfer" since these hold more information about the call and how it was routed through our system.

I want to use this for a dashboard, which would include counting the total number of calls from that query, which is why I cannot have both.

Here's the (Simplified) code used:

SELECT status, user, duration, phonenumber, date
FROM (SELECT * FROM view_InboundPhoneCalls) as Phonecalls
WHERE date>=DATEADD(dd, -15, getdate())
--GROUP BY duration

Which gives something of the sort:

Status User Duration phonenumber  date
Received Receptionnist 00:34:03  from: 1234567890  2021-09-30 16:01:57 
Received Receptionnist 00:03:12  from: 9876543210  2021-09-30 16:02:40 
Transfer User1 00:05:12   14161654965;Receptionnist;User1  2021-09-30 16:01:57 
Received Receptionnist 00:05:12  from: 14161654965  2021-09-30 16:01:57 

The end result would be something like this:

Status User Duration phonenumber  date
Received Receptionnist 00:34:03  from: 1234567890  2021-09-30 16:01:57 
Received Receptionnist 00:03:12  from: 9876543210  2021-09-30 16:02:40 
Transfer Receptionnist 00:05:12   14161654965;Receptionnist;User1  2021-09-30 16:01:57 

CodePudding user response:

You could go for a blacklist, say with a CTE, then filter out the undesired rows.

Something like:

WITH Blacklist ([date], [duration]) AS (
    SELECT [date], [duration] FROM view_InboundPhoneCalls
    GROUP BY [date], [duration] 
    Having count(*) > 1
)

SELECT status, user, duration, phonenumber, date
FROM 
    (SELECT * FROM view_InboundPhoneCalls) as Phonecalls
LEFT JOIN 
    Blacklist
ON Phonecalls.[date] = Blacklist.[date]
AND Phonecalls.[duration] = Blacklist.[duration]

Where 
    Blacklist.[date] is null
Or
    (Blacklist.[date] is not null AND Phonecalls.[Status] == 'Transfer')

CodePudding user response:

You can use row-numbering for this, along with a custom ordering. There is no need for any joins.

SELECT status, [user], duration, phonenumber, date
FROM (
    SELECT *,
      rn = ROW_NUMBER() OVER (PARTITION BY duration, date
             ORDER BY CASE WHEN Status = 'Transfer' THEN 1 ELSE 2 END)
    FROM view_InboundPhoneCalls
    WHERE date >= DATEADD(day, -15, getdate())
) as Phonecalls
WHERE rn = 1

CodePudding user response:

The normal "trick" is to detect duplicates first. One of the easier ways is a CTE (Common Table Expression) along with the ROW_NUMBER() function.

Part One - Mark the duplicates

WITH
  cte_Sorted_List
(
  status, usertype, duration, phonenumber, dated, duplicate_check
)
AS
( -- only use required fields to speed up
  SELECT  status, user, duration, phonenumber, date,
    -- marks depend on correct columns!
    Row_Number() OVER 
    ( -- sort over relevant columns to show
      PARTITION BY user, phonenumber, date, duration
      -- with correct sort order 
      -- bit of hack: As T comes after R
      -- logic: mark records to show as row number 1 in duplicate list
      ORDER BY status DESC
    ) AS duplicate_check
  FROM    view_InboundPhoneCalls
  -- and lose all unnecessary data
  WHERE   date>=DATEADD(dd, -15, getdate())
)

Part two - show relevant rows

SELECT
  status, usertype, duration, phonenumber, dated
FROM
  cte_Sorted_List
WHERE
  Duplicate_Check = 1
;

First CTE extracts required fields in single pass, then that data only is used for output.

  • Related