How do I group an SQL table by time stamps close to each other?-CodePudding

I have a table of user activity:

CREATE TABLE public.user_session_activity_table (
    id integer NOT NULL,
    "userId" integer NOT NULL,
    "creationDate" timestamp without time zone DEFAULT now() NOT NULL
);

INSERT INTO public.user_session_activity_table 
(
    id, 
    "userId",
    "creationDate"
) 
VALUES 
    (1, 1, '2021-11-06 10:54:23.891327'),
    (2, 1, '2021-11-06 10:59:56.616956'),
    (3, 1, '2021-11-06 10:59:57.680751'),
    (4, 1, '2021-11-06 10:59:58.857336'),
    (5, 1, '2021-11-06 11:36:47.112812'),
    (6, 1, '2021-11-06 11:36:49.049485'),
    (7, 1, '2021-11-06 11:36:50.931315')

Desired output:

id  userId  sessionLenght
1   1       123s                 -- row 1 
2   1       123s                 -- row 2-4 grouped together
3   1       123s                 -- row 4-7 grouped together

Explanation:

I'm creating a view of user sessions, form a table that contains the row's of saved user activity. I'd like to GROUP BY on the time delta that elapses between creation dates. If too much time elapses (let's say the threshold is 1 minute) the current group ends and a new one starts. This would result this sample data to be aligned to 3 groups:

id:1
id:2, id:3, id:4
id:5, id:6, id:7

As you can see, the most significant time difference is between id:1 <-> id:2 and id:4 <-> id:5, that's why it should breaks into 3 separate groups.

I'm using the latest version of PostgreSQL. The "sessionLength" is not quite that important, I can find a solution for that myself, the main problem is creating these groups.

One important fact is: rounding the date would't work, a session could last for several minutes, or hours. The thing that should end and begin groups is the time difference between activities (for example when a user is logged out, or away from keyboard for an hour).

Thank you, any help is greatly appreciated! (please tell me if the question is unclear, I'll try to clarify it a bit more! :) )

CodePudding user response：

window functions allow you to specyfy a range. You can use that to get the first value and last value for the addressed set of records. [I needed the omg subquery to actually use the values (substract them) and to filter only the (pseudo-) aggregates. Just omit the WHERE clause to see how it works...]

SELECT
        first_id, user_id
        , ze_first, ze_last
        , (ze_last-ze_first) AS timespan
        , 1 (last_id -first_id) AS nrecords
FROM    (
        SELECT
        id, user_id
        , first_value(id) OVER www AS first_id
        , last_value(id) OVER www AS last_id
        , first_value(creation_date) OVER www AS ze_first
        , last_value(creation_date) OVER www AS ze_last
        FROM user_session_activity_table
                WINDOW www AS (
                        PARTITION BY user_id
                        ORDER BY creation_date
                        -- This is the magic ...
                        RANGE BETWEEN '1 min' PRECEDING AND '1 min' FOLLOWING
                        )
        ) omg
WHERE id=first_id -- anything goes ...
        ;

CodePudding user response：

I know this doesn't give you a complete solution but it might help get you there, using row_number to identify rows to be grouped together in durations of 60 seconds:

with u as (
  select *,
    id - row_number() over (partition by userid, round(extract('epoch' from creationdate) / 60) * 60 order by creationdate) gp
  from t 
)
select 
  row_number() over(partition by max(userId) order by max(creationdate)) GroupNo,
  max(userid) UserId,
  min(creationdate) StartOfRange, max(CreationDate) EndOfRange,
  round(max(date_part('second',creationdate::time))- min(date_part('second',creationdate::time))) duration
from u
group by gp

CodePudding user response：

You need STRING_AGG function in postgresql; some thing like,I am not familiar with postgresql

SELECT id,userId,STRING_agg(id,';'),to_char(creationDate, 'HH:MI') FROM public.user_session_activity_table

GROUP BY
    userId,to_char(creationDate, 'HH:MI')