I need to count the avg number of messages sent by distinct users_id. I have the messages under the table "comments" , and the distinct users_id under the table "user". They join under the use of another table "tickets" this way:
left outer join tickets t on c.ticket_id = t.ticket_id
left outer join "users" u on u.user_id = t.requester
I'm not too sure where to start from either.
select user_id,
count avg (distinct c.ticket_id)
from "comments" c
group by ticket_id
left outer join tickets t on c.ticket_id = t.ticket_id
left outer join "users" u on u.user_id = t.requester
where t.created_at between '2022-10-01' and '2022-10-31'
CodePudding user response:
This might give you the result you want. The Common Table Expression counts the comments per ticket, then we select the average of the counts per ticket grouped by user.
with UserTicketCount (user_id, ticket_id, theCount) as
(
select u.user_id
, t.ticket_id
, count(c.comment_id) as theCount
from users u
left join tickets t
on t.requester = u.user_id
left join comments c
on c.ticket_id = t.ticket_id
group by u.user_id, t.ticket_id
)
select user_id
, avg(theCount) as AvgCommentPerTicket
from UserTicketCount
group by user_id;
CodePudding user response:
We'll start from the original query in the question:
select user_id,
count avg (distinct c.ticket_id)
from "comments" c
group by ticket_id
left outer join tickets t on c.ticket_id = t.ticket_id
left outer join "users" u on u.user_id = t.requester
where t.created_at between '2022-10-01' and '2022-10-31'
Given this, there are a number of things we need to change:
join
andwhere
must be listed beforegroup by
- This is not the correct format for SQL Server date literals. For date-only values, you should skip the
-
separators and just haveyyyyMMdd
. - If
created_at
is a full datetime, you probably need an exclusive upper boundary for November 1, rather than inclusive for October 31. If this is not true, normal naming conventions for date-only values are to use names like "created_on" instead of "created_at", and using an exclusive upper bound is still a good thing to do. - You can't select for
user_id
if yourGROUP BY
is forticket_id
. In this case, probablyuser_id
is correct for both. - You can't use
count
andavg()
at the same time.
Put it all together and you end up with this:
SELECT AVG(MessageCount)
FROM (
SELECT user_id,
count(distinct c.comment_id) as MessageCount
FROM "comments" c
LEFT JOIN tickets t on c.ticket_id = t.ticket_id
LEFT JOIN "users" u on u.user_id = t.requester
WHERE t.created_at >= '20221001' AND t.created_at < '20221101'
GROUP BY u.user_id
) t
Note I had to guess at the primary key for the comments
table.
Finally, it's strange to me that a ticket is needed to imply the user associated with a comment. Surely more than one person can be on a ticket?