I am trying to create a query that will give me a column of total time logged in for each month for each user.
username | auth_event_type | time | credential_id
Joe | 1 | 2021-11-01 09:00:00 | 44
Joe | 2 | 2021-11-01 10:00:00 | 44
Jeff | 1 | 2021-11-01 11:00:00 | 45
Jeff | 2 | 2021-11-01 12:00:00 | 45
Joe | 1 | 2021-11-01 12:00:00 | 46
Joe | 2 | 2021-11-01 12:30:00 | 46
Joe | 1 | 2021-12-06 14:30:00 | 47
Joe | 2 | 2021-12-06 15:30:00 | 47
The auth_event_type column specifies whether the event was a login (1) or logout (2) and the credential_id indicates the session.
I'm trying to create a query that would have an output like this:
username | year_month | total_time
Joe | 2021-11 | 1:30
Jeff | 2021-11 | 1:00
Joe | 2021-12 | 1:00
How would I go about doing this in postgres? I am thinking it would involve a window function? If someone could point me in the right direction that would be great. Thank you.
CodePudding user response:
Use the window function lag()
with a partition it by credential_id
ordered by time
, e.g.
WITH j AS (
SELECT username, time, age(time, LAG(time) OVER w)
FROM t
WINDOW w AS (PARTITION BY credential_id ORDER BY time
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
SELECT username, to_char(time,'yyyy-mm'),sum(age) FROM j
GROUP BY 1,2;
Note: the frame ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
is pretty much optional in this case, but it is considered a good practice to keep window functions as explicit as possible, so that in the future you don't have to read the docs to figure out what your query is doing.
Demo: db<>fiddle
CodePudding user response:
Solution 1 partially working
Not sure that window functions will help you in your case, but aggregate functions will :
WITH list AS
(
SELECT username
, array_agg(DISTINCT to_char(time, 'YYYY-MM')) AS year_month
, array_agg(time ORDER BY auth_event_type) AS time_session
FROM your_table
GROUP BY username, credential_id
)
SELECT username
, year_month[1] AS year_month
, sum(time_session[2] - time_session[1]) AS total_time
FROM list
GROUP BY username, year_month
The first part of the query aggregates the login/logout times for the same username, credential_id, the second part makes the sum per year_month of the difference between the login/logout times. This query works well until the login time and logout time are in the same month, but it fails when they aren't.
Solution 2 fully working
In order to calculate the total_time
per username and per month whatever the login time and logout time are, we can use a time range approach which intersects the ranges of time [login_time, logout_time)
with the monthly ranges of time [monthly_start_time, monthly_end_time)
:
WITH list AS
(
SELECT username
, array_agg(time ORDER BY auth_event_type) AS time_session
FROM your_table
GROUP BY username, credential_id
)
SELECT username
, to_char(t.month_start_date, 'YYYY-MM')
, sum(upper(f.period) - lower(f.period)) AS total_time
FROM list
INNER JOIN
( SELECT generate_series(min(date_trunc('month', time)), max(date_trunc('month', time)), '1 month') AS month_start_date
FROM your_table
) AS t
ON tsrange(t.month_start_date, t.month_start_date interval '1 month' - interval '1 second') && tsrange(time_session [1], time_session[2])
CROSS JOIN LATERAL (SELECT tsrange(t.month_start_date, t.month_start_date interval '1 month' - interval '1 second') * tsrange(time_session [1], time_session[2]) AS period) AS f
GROUP BY username, to_char(t.month_start_date, 'YYYY-MM')
see the result in dbfiddle