Home > Mobile >  Postgres Window Function and extracted/averaged duration between timestamps
Postgres Window Function and extracted/averaged duration between timestamps

Time:09-30

Been reading SO for years, but this is my first post. Hoping someone might be able to help me solve this.

I'm new to window functions, but from my understanding, it seems to be what I'm looking for. I've got 3 tables users, tasks and task_users. One or more users can be assigned to a task (through task_users). What I would like to see is a table showing the following:

user ID
user's full name
how many tasks have been issued to that user (occurrences)
average duration for all tasks issued to that user (average_duration)

The current method I'm using to extract a duration for a single task is:

EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600 AS duration

Here are the columns of interest in each table:

users
id
last_name
first_name

tasks
id
started_at (timestamp w/o tz)
closed_at (timestamp w/o tz)

task_users
task_id (references tasks.id)
user_id (references users.id)

Using sql below, I can generate a table showing each user, their ID and how many tasks were sent to that user:

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

This query shows:

----------------------------------
id    u_name           occurrences
----------------------------------
1  |  Mike Smith     | 10
2  |  Dave Johnson   | 5
3  |  George Wilson  | 3
etc...

What I would like to generate is the same table as above, but with average duration (hours it took to complete each task) for all tasks sent to each user. Something like the following:

------------------------------------------------------
id    u_name           occurrences    average_duration
------------------------------------------------------
1  |  Mike Smith     | 10           | 32.7
2  |  Dave Johnson   | 5            | 15.2
3  |  George Wilson  | 3            | 10.0
etc...

I've tried the following subquery and window function, but it's splitting up the users into multiple rows (number of rows showing for each user is equal to occurrences).

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    AVG(tsk.duration) OVER(PARTITION BY users.id) AS average_duration
FROM 
    (SELECT id, (EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600) AS duration FROM tasks) tsk
INNER JOIN task_users ON tsk.id = task_users.task_id
INNER JOIN users ON users.id = task_users.user_id

I'm new to window functions and not an SQL guru, but it seems to me that a window function is the best solution?

If anyone can point me in the right direction or offer up a suggestion, I'd really appreciate it.

Thanks!

CodePudding user response:

A window function will provide a value for each row. In your scenario, a user has multiple tasks and as such, the join will result in multiple rows per user.

You may try the following approach which modifies your original approach that groups the data by user:

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    SUM(
        EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600
    ) / COUNT(*) as average_duration
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

or

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    AVG(
        EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600
    ) as average_duration
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

Let me know if this works for you.

  • Related