Home > other >  How to group by similar integer values in Postgres sql?
How to group by similar integer values in Postgres sql?

Time:01-27

I have a very simple Database Table where a new entry is inserted everytime a product is scanned (RFID-Scanner).

Scans Table:

ID (PK) Product_ID (FK) Created_At
1 1 2023-01-26 10:39:00.0000
2 2 2023-01-26 10:39:02.0000
3 3 2023-01-26 10:39:04.0000
4 4 2023-01-26 10:47:00.0000

My goal is to cluster the product ids by the time they were scanned with a specified tolerance (in seconds), so for example for the entries in my table and a tolerance of 10 seconds, the desired result would be

Product_IDs
{1, 2, 3}
{4}

My first attempt to solve the issue was something like this:

SELECT ARRAY_AGG(DISTINCT Product_ID) FROM scans GROUP BY ROUND(EXTRACT(EPOCH FROM created_at) / 10);

This approach works a little, but in edge cases, when for example one product is scanned at second 19 and another one at seconds 21, it wouldn't be grouped together, although it should.

What is a better, more reliable way to solve this problem?

CodePudding user response:

I will assume that the groups are separated if the time between to rows is more than 10 seconds. For example with test data

create table scans(ID int,  Product_ID int, Created_At TimeStamp);
insert into scans values
 (1,    1,cast('2023-01-26 10:39:00.000' as TimeStamp))
,(2,    2,cast('2023-01-26 10:39:02.000' as TimeStamp))
,(3,    3,cast('2023-01-26 10:39:11.000' as TimeStamp))
,(4,    4,cast('2023-01-26 10:47:00.000' as TimeStamp))
;

Calculate time difference between current row and preceding. When difference greater than '10 second' - thats new group of scans starts.

with ScansDif as(
  select * 
    ,Created_At-lag(Created_At,1,Created_At)over(order by Created_At) dif
  from scans
)
,ScansGroup as(
  select * 
     ,sum(case when dif>cast('10'||' second' as interval) then 1 else 0 end)
          over(order by Created_At rows unbounded preceding) grN
  from ScansDif
)
SELECT ARRAY_AGG(DISTINCT Product_ID) 
FROM ScansGroup 
GROUP BY grn

Group numbers

id product_id created_at dif grn
1 1 2023-01-26 10:39:00 00:00:00 0
2 2 2023-01-26 10:39:02 00:00:02 0
3 3 2023-01-26 10:39:11 00:00:09 0
4 4 2023-01-26 10:47:00 00:07:56 1

Time difference between first and last row of group 0 is 00:11. Result

array_agg
{1,2,3}
{4}

Demo

  • Related