Home > OS >  Removing duplicate rows based on date column and status
Removing duplicate rows based on date column and status

Time:06-18

I have the following table, where every row represents a change in the user's status and the occurrence date.

Date ID Status
02.01.2021 64 Register
02.04.2021 64 Active
02.07.2021 64 Not Active
02.10.2021 64 Active
02.25.2021 64 Active
02.30.2021 64 Not Active
03.03.2021 64 Active
01.01.2021 11 Register
01.06.2021 11 Active
01.07.2021 11 Active
01.10.2021 11 Elite
01.15.2021 11 Elite

It contains duplicate statues for different dates and I would like to retrieve only the latest status for when there are consequent statuses.

I want my end table to look like this:

Date ID Status
02.01.2021 64 Register
02.04.2021 64 Active
02.07.2021 64 Not Active
02.25.2021 64 Active
02.30.2021 64 Not Active
03.03.2021 64 Active
01.01.2021 11 Register
01.07.2021 11 Active
01.15.2021 11 Elite

Would appreciate any help on this.

CodePudding user response:

You can use LEAD to check if the next row is different, and exclude it if it's the same

SELECT
  Date,
  ID,
  Status
FROM (
    SELECT *,
      Prev = LEAD(Status) OVER (PARTITION BY ID ORDER BY Date)
    FROM YourTable t
) t
WHERE Prev <> Status OR Prev IS NULL;

db<>fiddle

CodePudding user response:

This seems like a "Gaps and Islands" problem as stated by a comment.

--SampleData
WITH mydata AS (
SELECT * FROM (VALUES 
 (1,'active',CAST(GETDATE()-1 AS Date))
,(1,'active',CAST(GETDATE()-2 AS Date))
,(1,'disabled',CAST(GETDATE()-3 AS Date))
,(1,'active',CAST(GETDATE()-4 AS Date))
,(1,'active',CAST(GETDATE()-5 AS Date))
,(2,'active',CAST(GETDATE()-1 AS Date))
,(2,'disabled',CAST(GETDATE()-2 AS Date))
,(2,'disabled',CAST(GETDATE()-3 AS Date))
,(2,'disabled',CAST(GETDATE()-4 AS Date))
,(2,'active',CAST(GETDATE()-5 AS Date))
) x(ID,Status,Date)
)
--Actual logic starts
,Islands AS (
SELECT *,
Island = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id, Date) -
ROW_NUMBER() OVER (PARTITION BY Id, Status ORDER BY Id, Date)
FROM mydata
)

SELECT Id, Status, Island, MAX(Date) AS MaxDate
FROM Islands
GROUP BY Id, Status, Island
ORDER BY Id, MaxDate

In order to understand it, you will need to have a look at what Windowing functions are, and how they behave in this specific context

  • Related