I have a very simple Database Table where a new entry is inserted everytime a product is scanned (RFID-Scanner).
Scans Table:
ID (PK) | Product_ID (FK) | Created_At |
---|---|---|
1 | 1 | 2023-01-26 10:39:00.0000 |
2 | 2 | 2023-01-26 10:39:02.0000 |
3 | 3 | 2023-01-26 10:39:04.0000 |
4 | 4 | 2023-01-26 10:47:00.0000 |
My goal is to cluster the product ids by the time they were scanned with a specified tolerance (in seconds), so for example for the entries in my table and a tolerance of 10 seconds, the desired result would be
Product_IDs |
---|
{1, 2, 3} |
{4} |
My first attempt to solve the issue was something like this:
SELECT ARRAY_AGG(DISTINCT Product_ID) FROM scans GROUP BY ROUND(EXTRACT(EPOCH FROM created_at) / 10);
This approach works a little, but in edge cases, when for example one product is scanned at second 19 and another one at seconds 21, it wouldn't be grouped together, although it should.
What is a better, more reliable way to solve this problem?
CodePudding user response:
I will assume that the groups are separated if the time between to rows is more than 10 seconds. For example with test data
create table scans(ID int, Product_ID int, Created_At TimeStamp);
insert into scans values
(1, 1,cast('2023-01-26 10:39:00.000' as TimeStamp))
,(2, 2,cast('2023-01-26 10:39:02.000' as TimeStamp))
,(3, 3,cast('2023-01-26 10:39:11.000' as TimeStamp))
,(4, 4,cast('2023-01-26 10:47:00.000' as TimeStamp))
;
Calculate time difference between current row and preceding. When difference greater than '10 second' - thats new group of scans starts.
with ScansDif as(
select *
,Created_At-lag(Created_At,1,Created_At)over(order by Created_At) dif
from scans
)
,ScansGroup as(
select *
,sum(case when dif>cast('10'||' second' as interval) then 1 else 0 end)
over(order by Created_At rows unbounded preceding) grN
from ScansDif
)
SELECT ARRAY_AGG(DISTINCT Product_ID)
FROM ScansGroup
GROUP BY grn
Group numbers
id | product_id | created_at | dif | grn |
---|---|---|---|---|
1 | 1 | 2023-01-26 10:39:00 | 00:00:00 | 0 |
2 | 2 | 2023-01-26 10:39:02 | 00:00:02 | 0 |
3 | 3 | 2023-01-26 10:39:11 | 00:00:09 | 0 |
4 | 4 | 2023-01-26 10:47:00 | 00:07:56 | 1 |
Time difference between first and last row of group 0 is 00:11. Result
array_agg |
---|
{1,2,3} |
{4} |