Home > Software design >  Find duplicate data in last 1 hour
Find duplicate data in last 1 hour

Time:12-30

I am looking for a SQL script to find the data which has more than 2 entries in last 1 hour. I have a table having user_id & event_time. I want a way to find out if the user_id has more than 1 entries in last 1 hour.

I have tried below till now:

  1. Create temp table to put all duplicate entries :
SELECT a.*
INTO #temp 
FROM Table a
JOIN (
    SELECT USERID, COUNT(*) AS Duplicates
    FROM Table
    GROUP BY userid
    HAVING count(*) > 1
) AS b ON a.userid = b.USERID
  1. Run self Joins to fetch records having time difference of 1 hour or less:
SELECT a.*
FROM #temp a
INNER JOIN #temp b ON a.userid = b.USERID
WHERE DATEDIFF(hour, a.EVENTTIME, b.EVENTTIME) = 1

Once first script is ran it gives around 800 rows for duplicate data. But after running the second script the data I get is in thousands. Can anyone help here?

CodePudding user response:

Give a row number for each group of user_id in the order of date difference in hours. Remember to filter the rows which have the event_date in last 1 hour.

Query

;with cte as(
    select [rn] = row_number() over(
        partition by [user_id]
        order by [user_id], datediff(hour, [event_time], getdate())
    ), *
    from [your_table_name]
    where datediff(hour, [event_time], getdate()) < 2
)
select * from [your_table_name] as [t1]
where exists(
    select 1 from cte as [t2]
    where [t1].[user_id]= [t2].[user_id]
    and [t2].[rn] > 1
);

CodePudding user response:

cross apply can be used to get all related events for each event according to your criteria as follows:

With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select ID, USERID, EVENTTIME 
                           From CTE 
                           Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
                                 And USERID=a.USERID And ID<>a.ID) As T
Order by a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME

or you can get a list of events without binding to a specific event:

With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME 
                           From CTE 
                           Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
                                 And USERID=a.USERID And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME

db<>fiddle

to get the events only for last hour, you can add the appropriate filter to Where clause in CTE.

With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
Where EVENTTIME Between dateadd(minute, -60, GetDate()) And GetDate()
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME 
                           From CTE 
                           Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
                                 And USERID=a.USERID 
                                 And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME
  • Related