Home > database >  SQL count avg mess. per user_id
SQL count avg mess. per user_id

Time:11-11

I need to count the avg number of messages sent by distinct users_id. I have the messages under the table "comments" , and the distinct users_id under the table "user". They join under the use of another table "tickets" this way:

left outer join tickets t on c.ticket_id = t.ticket_id
left outer  join "users" u on u.user_id  = t.requester

I'm not too sure where to start from either.

select user_id,
count avg (distinct c.ticket_id)
from "comments" c
group by ticket_id
left outer join tickets t on c.ticket_id = t.ticket_id
left outer  join "users" u on u.user_id  = t.requester
where t.created_at between '2022-10-01' and '2022-10-31'

CodePudding user response:

This might give you the result you want. The Common Table Expression counts the comments per ticket, then we select the average of the counts per ticket grouped by user.

with UserTicketCount (user_id, ticket_id, theCount) as
(
      select u.user_id
           , t.ticket_id
           , count(c.comment_id) as theCount
        from users u
   left join tickets t
          on t.requester = u.user_id
   left join comments c
          on c.ticket_id = t.ticket_id
   group by u.user_id, t.ticket_id
)
select user_id
     , avg(theCount) as AvgCommentPerTicket
  from UserTicketCount
group by user_id;

CodePudding user response:

We'll start from the original query in the question:

select user_id,
count avg (distinct c.ticket_id)
from "comments" c
group by ticket_id
left outer join tickets t on c.ticket_id = t.ticket_id
left outer  join "users" u on u.user_id  = t.requester
where t.created_at between '2022-10-01' and '2022-10-31'

Given this, there are a number of things we need to change:

  • join and where must be listed before group by
  • This is not the correct format for SQL Server date literals. For date-only values, you should skip the - separators and just have yyyyMMdd.
  • If created_at is a full datetime, you probably need an exclusive upper boundary for November 1, rather than inclusive for October 31. If this is not true, normal naming conventions for date-only values are to use names like "created_on" instead of "created_at", and using an exclusive upper bound is still a good thing to do.
  • You can't select for user_id if your GROUP BY is for ticket_id. In this case, probably user_id is correct for both.
  • You can't use count and avg() at the same time.

Put it all together and you end up with this:

SELECT AVG(MessageCount)
FROM (
    SELECT user_id,
        count(distinct c.comment_id) as MessageCount
    FROM "comments" c
    LEFT JOIN tickets t on c.ticket_id = t.ticket_id
    LEFT JOIN "users" u on u.user_id  = t.requester
    WHERE t.created_at >= '20221001' AND t.created_at < '20221101'
    GROUP BY u.user_id
) t

Note I had to guess at the primary key for the comments table.

Finally, it's strange to me that a ticket is needed to imply the user associated with a comment. Surely more than one person can be on a ticket?

  • Related