Home > Mobile >  How to find the number of events for the first 24 hours for each user id
How to find the number of events for the first 24 hours for each user id

Time:10-29

I'm working on snowflake to solve a problem. I wanted to find the number of events for the first 24 hours for each user id.

This is a snippet of the database table I'm working on. I modified the table and used a date format without the time for simplification purposes.

user_id client_event_time
1 2022-07-28
1 2022-07-29
1 2022-08-21
2 2022-07-29
2 2022-07-30
2 2022-08-03

I used the following approach to find the minimum event time per user_id.

SELECT user_id, client_event_time,
       ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY client_event_time) row_number,
       MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
FROM Data
ORDER BY user_id, client_event_time;
user_id client_event_time row_number MinEventTime
1 2022-07-28 1 2022-07-28
1 2022-07-29 2 2022-07-28
1 2022-08-21 3 2022-07-28
2 2022-07-29 1 2022-07-29
2 2022-07-30 2 2022-07-29
2 2022-08-03 3 2022-07-29

Then I tried to find the difference between the minimum event time and client_event_time, and if the difference is less than or equal to 24, I counted the client_event_time.

with NewTable as (
        (SELECT user_id,client_event_time, event_type,
        row_number() over (partition by user_id order by CLIENT_EVENT_TIME) row_number,
        MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
        FROM Data
        ORDER BY user_id, client_event_time))
    
SELECT user_id,  
        COUNT(case when timestampdiff(hh, client_event_time, MinEventTime) <= 24  then 1 else 0 end) AS duration
FROM    NEWTABLE
GROUP BY user_id

I got the following result:

user_id duration
1 3
2 3

I wanted to find the following result:

user_id duration
1 2
2 2

Could you please help me solve this problem? Thanks!

CodePudding user response:

If you want the count of events around 24 hours of the minimun event time, you canuse a group by CTE that givbes you all the minumum event tomes for all users

the rest is to get all the rows that are in the tme limit

WITH min_data as
(SELECT user_id,MIN(client_event_time) mindate FROM data GROUP BY user_id)
  SELECT d.user_id, COUNT(*) 
  FROM data d JOIN min_data md ON d.user_id = md.user_id WHERE client_event_time <= mindate   INTERVAL '24 hour'
GROUP BY d.user_id
ORDER BY d.user_id

user_id count
1 2
2 2

CodePudding user response:

Easier done with a qualify

with cte as
(select *
 from mytable
 qualify event_time<=min(event_time) over (partition by user_id)   interval '24 hours')

select user_id, count(*) as counts
from cte
group by user_id

CodePudding user response:

This looks like a problem for windowed functions! I like them a lot.

Here's you sample data

DECLARE @table TABLE (user_id INT, client_event_time DATETIME)
INSERT INTO @table (user_id, client_event_time) VALUES
(1, '2022-07-28 13:30:00'),
(1, '2022-07-29 08:30:00'),
(1, '2022-08-21 12:34:56'),
(2, '2022-07-29 08:30:00'),
(2, '2022-07-30 13:30:00'),
(2, '2022-08-03 12:34:56')

I added some hours to it, so we can look at 24 hour windows more easily. For user_id 1 we can see they had 2 events in the 24 hours after their initial one. For user_id 2 there was only the first one. We can capture that with a MIN OVER, along with the actual datetimes.

SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time
  FROM @table 
user_id FirstEventDateTime      client_event_time
-------------------------------------------------------
1       2022-07-28 13:30:00.000 2022-07-28 13:30:00.000
1       2022-07-28 13:30:00.000 2022-07-29 08:30:00.000
1       2022-07-28 13:30:00.000 2022-08-21 12:34:56.000
2       2022-07-29 08:30:00.000 2022-07-29 08:30:00.000
2       2022-07-29 08:30:00.000 2022-07-30 13:30:00.000
2       2022-07-29 08:30:00.000 2022-08-03 12:34:56.000

Now we have the first datetime and each rows datetime in the resultset together, we can make a comparison:

SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time, CASE WHEN DATEDIFF(HOUR,MIN(client_event_time) OVER (PARTITION BY user_id), client_event_time)  < 24 THEN 1 ELSE 0 END AS EventsInFirst24Hours
  FROM @table 
user_id FirstEventDateTime      client_event_time       EventsInFirst24Hours
----------------------------------------------------------------------------
1       2022-07-28 13:30:00.000 2022-07-28 13:30:00.000 1
1       2022-07-28 13:30:00.000 2022-07-29 08:30:00.000 1
1       2022-07-28 13:30:00.000 2022-08-21 12:34:56.000 0
2       2022-07-29 08:30:00.000 2022-07-29 08:30:00.000 1
2       2022-07-29 08:30:00.000 2022-07-30 13:30:00.000 0
2       2022-07-29 08:30:00.000 2022-08-03 12:34:56.000 0

Now we have an indicator telling us which events occurred in the first 24 hours, all we really need is to sum it, but SQL Server is mean about using a windowed function in another aggregate, so we need to cheat and put it into a subquery.

SELECT user_id, SUM(EventsInFirst24Hours) AS CountOfEventsInFirst24Hours
  FROM (
        SELECT user_id, MIN(client_event_time) OVER (PARTITION BY user_id) AS FirstEventDateTime, client_event_time, CASE WHEN DATEDIFF(HOUR,MIN(client_event_time) OVER (PARTITION BY user_id), client_event_time)  < 24 THEN 1 ELSE 0 END AS EventsInFirst24Hours
          FROM @table 
       ) a
 GROUP BY user_id

And that gets us to the result:

user_id CountOfEventsInFirst24Hours
-----------------------------------
1       2
2       1

A little about what's going on with the windowed function: MIN - the aggregation we want it to do. The common aggregate functions have windowed counterparts. (client_event_time) - the value we want to do it to. OVER (PARTITION BY user_id) - the window we want to set up. In this case we want to know the minimum datetime for each of the user_ids. We can partition by as many columns as we'd like. You can also use an ORDER BY with as many columns as you'd like, but that was not necessary here. Ex: OVER (PARTITION BY column1, column2 ORDER BY column4, column5 DESC) Partition (or group by) column1 and column2 and order by column4 and column5 descending.

  • Related