Home > front end >  Alternative to looping
Alternative to looping

Time:01-13

I have a table like the following and I am required to show the subtotal of the use_time_sec column grouping by event_datetime, event_name (only show login), user_id and system_id.

sample input table

with sample_input as (
    select '12/01/2023 14:27:59' as event_datetime, 'login' as event_name,'1' as user_id, 'X' as system_id, '0' as use_time_sec 
    union all
    select '12/01/2023 14:28:05', 'screen 1',   '1',    'X',    '2'
    union all
    select '12/01/2023 14:28:05', 'screen 2',   '1',    'X',    '5',
    union all
    select '12/01/2023 14:28:17', 'screen 1',   '1',    'X',    '3',
    union all
    select '12/01/2023 14:28:23', 'logout', '1',    '', '0',
    union all
    select '12/01/2023 14:28:23', 'login',  '2',    'Y',    '0',
    union all
    select '12/01/2023 14:28:23', 'screen 1',   '2',    'Y',    '10',
    union all
    select '12/01/2023 14:28:24', 'screen 2',   '2',    'Y',    '100',
    union all
    select '12/01/2023 14:28:29', 'login',  '1',    'X',    '0',
    union all
    select '12/01/2023 14:28:29', 'screen 1',   '1',    'X',    '500',
    union all
    select '12/01/2023 14:28:29', 'logout', '1',    '', '0',
)
select * from sample_input

sample output

I can loop through the table to get my desired output. But thats not the most efficient solution as there are few millions of record in the table and growing everyday.

Will appreciate if someone can provide a better solution than what I have.

Note: The data is in google BigQuery.

Thanks

CodePudding user response:

This is known as the Gaps and Islands problem. We're trying to identify the islands of user sessions. We need to do a query which gives us some way to identify a session. This relies heavily on window functions.

One way is to count the number of logins seen per user.

select
  *,
  sum(1)
    filter(where event_name = 'login')
    over(partition by user_id order by event_time)
    as session_num
from events
order by event_time

That will keep a tally per user_id. It will add to the tally every time it sees a user login.

event_time event_type user_id use_time_sec session_num
1000 login 1 0 1
1001 things 1 3 1
1001 login 2 10 1
1002 logout 1 7 1
1005 logout 2 20 1
1100 login 1 5 2
1101 logout 1 10 2

Now we have a way to identify each user's sessions. We can grouping by user_id and session_num. These are our islands.

with sessions as (
  select
    *,
    sum(1)
      filter(where event_name = 'login')
      over(partition by user_id order by event_time)
      as session_num
  from events
  order by event_time
)
select
  min(event_time) as session_start,
  user_id,
  sum(use_time_sec) as total_use_time_sec
from sessions
group by user_id, session_num
order by session_start
session_start user_id total_use_time_sec
1000 1 10
1001 2 130
1100 1 15

Demonstration in PostgreSQL, but it should work fine on BigQuery.

CodePudding user response:

Thank you so much Schwern. That worked. I had to replace filter with IF.

with sample_input as (
    select '12/01/2023 14:27:59' as event_datetime, 'login' as event_name,'1' as user_id, 'X' as system_id, '0' as use_time_sec 
    union all
    select '12/01/2023 14:28:05', 'screen 1',   '1',    'X',    '2'
    union all
    select '12/01/2023 14:28:05', 'screen 2',   '1',    'X',    '5',
    union all
    select '12/01/2023 14:28:17', 'screen 1',   '1',    'X',    '3',
    union all
    select '12/01/2023 14:28:23', 'logout', '1',    'X',    '0',
    union all
    select '12/01/2023 14:28:23', 'login',  '2',    'Y',    '0',
    union all
    select '12/01/2023 14:28:23', 'screen 1',   '2',    'Y',    '10',
    union all
    select '12/01/2023 14:28:24', 'screen 2',   '2',    'Y',    '100',
    union all
    select '12/01/2023 14:28:29', 'login',  '1',    'X',    '0',
    union all
    select '12/01/2023 14:28:30', 'screen 1',   '1',    'X',    '500',
    union all
    select '12/01/2023 14:35:26', 'logout', '1',    'X',    '0',
),
page_time_be_session_details as (
select 
  *, 
  sum(if(event_name = 'login',1,0)) over(partition by user_id, system_id order by sample_input.event_datetime) as session_pageview
from sample_input
order by sample_input.event_datetime
)
select  min(event_datetime) as login_time, user_id, system_id,sum(cast(use_time_sec as int)) as usage_duration
from page_time_be_session_details
group by user_id, system_id, session_pageview
  • Related