Home > Net >  SQL to compute online time from login/logout log database
SQL to compute online time from login/logout log database

Time:06-10

I'm trying to compute the actual time spent online by the users of my service. This is similar to some other posts (such as Login Logout Time in SQL Server and MySQL Query For Total Online Time Based On Login/Logout Entries), however in my case, a given user can have several sessions open simultaneously.

The table data is like this

| timestamp | userid | sessionid | event        |
| 09:15:25  |      1 |      abcd | connected    |
| 09:17:45  |      1 |      bcde | connected    |
| 09:18:45  |      2 |      fght | connected    |
| 09:27:55  |      1 |      bcde | disconnected |
-> now user 1 is still online!

Sessions can overlap, so I cannot just sum durations based on sessionid only.

It's quite easy to do in a programming language such as PHP, but I wonder if it's doable with a database (in my case Postgresql)? maybe with partition?

Thanks in advance

Regards

CodePudding user response:

Since the version Postgres 14 there is a range_agg() aggregation which may help here:

step-by-step demo:db<>fiddle

Disclaimer: At the moment of writing, I am not quite sure what the real expected result is? What should be returned, when the session is still connected? What should be returned, when a user opens a session later on. In my example I assumed, that "Connection still open" is similar to "Goes until now." and a reopened connection does not count the break between two sessions. However this should only demonstrate one of many ways and can surely be adjusted to your specific requirements:

SELECT
    userid,
    SUM(upper(r) - lower(r)) as time              -- 5
FROM (
    SELECT 
        userid,
        unnest(range_agg) as r                    -- 4
    FROM (
        SELECT
            userid,
            range_agg(r)                          -- 3
        FROM (
            SELECT
                userid,
                sessionid,
                CASE WHEN COUNT(*) = 2 THEN       -- 2
                    tsrange(MIN("timestamp")   current_date, MAX("timestamp")   current_date)
                ELSE tsrange(MIN("timestamp")   current_date, now())
                END as r
            FROM mytable
            GROUP BY userid, sessionid            -- 1
        )s
        GROUP BY userid                           -- 3
    )s
)s
GROUP BY userid
  1. Group your table by the session (meaning: sessionid and the userid).
  2. If there are 2 entries for one session, it is closed, so return a timestamp range with start and stop entry of the session. Otherwise take current time (now()) as stop time for the timestamp range. You see the current_date because I need a type timestamp value for the range and you only delivered a type time, so I added a specific date. It is always useful to work with timestamps instead of time values: Think about a session which is open over midnight.
  3. Now you have a timestamp range for each sessionid. Means, there can be more than one per userid. Now use the new aggregate function range_agg() to combine the session's time ranges. The aggregate function automatically merges overlapping ranges, like 08:00-10:00 and 09:00-11:00 into one range 08:00-11:00. If there is no overlap, for example a new session after a break, it adds the additional range to a range array. So, the range_agg() aggregation does the magic for calculation the overlaps for you.
  4. Now you have an array of timestamp ranges for each user contains several time ranges (only one if all overlapped, more if not). unnest() splits the array into one range per row.
  5. Now you can calculate the length of each timestamp range by calculate the difference of the upper and lower range bound. Afterwards you can SUM() aggregate all these differences per user.

The most problematic thing is calculating the overlaps. Before Postgres 14 you need to do this manually in some way, which always comes to nasty edge cases (complete overlaps, overlapping only at start or end, no overlap, ...).

I am sure, there is much space for optimization but as a sketch it should work.

CodePudding user response:

You can use a self join to get the length of each user session:

select st.userid, 
       st.sessionid, 
       coalesce(en."timestamp", current_timestamp) - st."timestamp" as session_duration,
       en."timestamp" is not null as session_is_finished
from session st
  left join session en 
        on st.sessionid = en.sessionid 
       and st.userid = en.userid 
       and en.event = 'disconnected'
where st.event = 'connected'       
order by userid, sessionid;

The above assumes that a session that has no disconnected even is still "live" and thus the duration is up until "now". I have added a flag column session_is_finished to show this situation.

For your sample data this returns the following:

userid | sessionid | session_duration                                     | session_is_finished
------- ----------- ------------------------------------------------------ --------------------
     1 | abcd      | 0 years 0 mons 0 days 11 hours 6 mins 52.922171 secs | false              
     1 | bcde      | 0 years 0 mons 0 days 0 hours 10 mins 10.0 secs      | true               
     2 | fght      | 0 years 0 mons 0 days 11 hours 3 mins 32.922171 secs | false              

If you want to get the total duration for a user, just aggregate the session duration for each user:

select userid, 
       sum(session_duration)
from (
  select st.userid, 
         st.sessionid, 
         coalesce(en."timestamp", current_timestamp) - st."timestamp" as session_duration,
         en."timestamp" is not null as session_is_finished
  from session st
    left join session en 
          on st.sessionid = en.sessionid 
         and st.userid = en.userid 
         and en.event = 'disconnected'
  where st.event = 'connected'       
) a
group by userid;
  • Related