Select all rows where dates are within a two hour span-CodePudding

I have a SQL table:

id	phoneNo	transactionDateTime	transactionDate
1	123456	2022-03-15 11:46:51:00	2022-03-15
2	123456	2022-03-15 11:49:52:00	2022-03-15
3	123456	2022-03-15 11:52:52:00	2022-03-15
4	123456	2022-03-15 12:53:00:00	2022-03-15
5	123456	2022-03-15 11:49:52:00	2022-03-15
6	234567	2022-03-15 11:49:52:00	2022-03-15
7	546345	2022-03-15 11:49:52:00	2022-03-15
8	874655	2022-03-15 11:49:52:00	2022-03-15

I need to get all the entries where more than 4 transactions have happened for the same phoneNoand the transactionDateTime's are within a 2 hour span, so in the case above, id 1 to 5 have the same phoneNo and are within the 2 hour span, so the result would be

id	phoneNo	transactionDateTime	transactionDate
1	123456	2022-03-15 11:46:51:00	2022-03-15
2	123456	2022-03-15 11:49:52:00	2022-03-15
3	123456	2022-03-15 11:52:52:00	2022-03-15
4	123456	2022-03-15 12:53:00:00	2022-03-15
5	123456	2022-03-15 11:49:52:00	2022-03-15

So far, I have been able to query :

SELECT * FROM txnTable
WHERE phoneNo IN
  (SELECT phoneNo 
   FROM txnTable 
   GROUP BY phoneNo, transactionDate
   HAVING COUNT(phoneNo) > 4 )

which only groups all entries by date and not the 2 hour timespan. Would appreciate any help on how to write a query in this particular scenario.

CodePudding user response：

You can use SQL Server floor function to group your transactionDateTime every 2 hours.

SELECT * FROM txnTable
WHERE phoneNo IN
  (SELECT phoneNo 
   FROM txnTable 
   GROUP BY phoneNo,  transactionDate, floor(datepart(hh, transactionDateTime)/2
   HAVING COUNT(phoneNo) > 4  and floor(datepart(hh, transactionDateTime)/2) = 2)

CodePudding user response：

Choose every row from the table, as long as the number of records with the same phoneno and within 2 hours each way is 4

SELECT * 
FROM txnTable t1
WHERE 
    4= (SELECT count(*)
        FROM txnTable t2
        WHERE t2.phoneNo=t1.phoneNo
          AND abs(datediff(second, 
                           t1.transactionDateTime, 
                           t2.transactionDateTime
                           )
                  ) <=2*3600
       )
ORDER BY phoneNo, transactionDateTime

CodePudding user response：

You can calculate the hour difference between 2 rows which is inside a partitioned bucket and arrive at the max of differences within that bucket. Once that is deduced, it can be filtered using the "2 hours" criteria as below.

declare @tbl table(id int identity(1,1)
,phoneNo int,transactionDateTime datetime,transactionDate date
)

insert into @tbl(phoneNo,transactionDateTime,transactionDate)
values
('123456','2022-03-15 11:46:51:00','2022-03-15')
,('123456','2022-03-15 11:49:52:00','2022-03-15')
,('123456','2022-03-15 11:52:52:00','2022-03-15')
,('123456','2022-03-15 12:53:00:00','2022-03-15')
,('123456','2022-03-15 11:49:52:00','2022-03-15')
,('234567','2022-03-15 11:49:52:00','2022-03-15')
,('546345','2022-03-15 11:49:52:00','2022-03-15')
,('874655','2022-03-15 11:49:52:00','2022-03-15')

select id,phoneNo,transactionDateTime,transactionDate 
from(
select *,max(hourdiff)over(partition by phoneNo order by transactionDateTime
rows between unbounded preceding and unbounded following
) as [maxhours] from(
select *
,DATEDIFF(hour, transactionDateTime , max(transactionDateTime)over(partition by phoneNo 
order by transactionDateTime rows between current row and 1 following 
)) AS hourdiff
,ROW_NUMBER()over(partition by phoneNo order by phoneNo) rownum
,count(1)over(partition by phoneNo order by phoneNo) [count]
from 
@tbl
)t)t
where t.[count] > 4 and 
t.maxhours <= 2  
and t.maxhours <>0
order by id