I have a comments table that looks like:
CREATE TABLE comments (
comment_id INT GENERATED ALWAYS AS IDENTITY,
user_id VARCHAR(255) NOT NULL,
username VARCHAR(15) NOT NULL,
content VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
post_id VARCHAR(255) references posts(post_id),
reply_to INT,
);
For each comment, I'm trying to select the number of replies. A comment that is a reply will have a reply_to
column that references the comment_id
the reply is in response too.
I'm trying to select all column that are in response to a post, however I also want to select the number of replies each of those comments have.
This is the query:
SELECT comment_id, username, content,
(
SELECT COUNT(comments.*) FROM comments
JOIN comments comments1 on comments.comment_id = comments1.reply_to
)
as replies FROM comments WHERE post_id = 'uuid';
The query returns:
comment_id | username | content | replies |
---|---|---|---|
2 | username1 | comment content | 1 |
3 | username2 | comment content | 1 |
The problem is that it returns both comments have a reply count of 1 even though I have only 1 record that has a reply_to
column of 3.
CodePudding user response:
Why your solution does not work
The subquery you use in your SELECT clause is inner joining ALL comments with ALL of their replies and then doing a count on ALL of the resultant rows in the resultant table. Note you are using an inner join in your example (JOIN is shorthand for INNER JOIN) so if a comment does not have any replies it is not included in the resultant table. So the count is of ALL the comments which have at least one reply.
(
-- you are counting ***ALL** the comments in your table inner joined onto their replies.
SELECT COUNT(comments.*) FROM comments
JOIN comments comments1 on comments.comment_id = comments1.reply_to
)
But you want the count to be scoped to each comment so that it's the count of only their replies, not the entire table.
The solution
Assuming replies can only go one level deep then I believe you can either use a lateral join or a common table expression using a window function.
Here is how to solve it with lateral joins. If I have time later I'll update my answer with a window function approach. I think however on average, lateral joins are more performant than a window function approach though it may depend on your data.
create temporary table comments (
id serial primary key,
post_id int,
reply_to_id int,
body varchar
);
-- the body for each comment illustrates the comment hierarchy for a post
-- POST.PARENT_COMMENT.REPLY. E.g. 1.1.1 means this comment is a reply to the first comment for the first post.
insert into comments (post_id, reply_to_id, body) values
(1, null, '1.1'),
(1, 1, '1.1.1'),
(1, 1, '1.1.2'),
(1, null, '1.2'),
(2, null, '2.1'),
(2, 5, '2.1.1');
-- Lets look at the comments with their replies
select comments.*, replies.* from comments
left outer join comments as replies on comments.id = replies.reply_to_id
order by comments.post_id, replies.reply_to_id ASC;
/*
| COMMENT | REPLY |
id | post_id | reply_to_id | body | id | post_id | reply_to_id | body
---- --------- ------------- ------- ---- --------- ------------- -------
1 | 1 | | 1.1 | 3 | 1 | 1 | 1.1.2
1 | 1 | | 1.1 | 2 | 1 | 1 | 1.1.1
3 | 1 | 1 | 1.1.2 | | | |
2 | 1 | 1 | 1.1.1 | | | |
4 | 1 | | 1.2 | | | |
5 | 2 | | 2.1 | 6 | 2 | 5 | 2.1.1
6 | 2 | 5 | 2.1.1 | | | |
*/
--- now lets get the reply count
select comments.*, r.reply_count from comments
left join lateral (
select count(replies.id) as reply_count from comments as replies
where comments.id = replies.reply_to_id
) as r on true
order by comments.post_id ASC;
/*
id | post_id | reply_to_id | body | reply_count
---- --------- ------------- ------- -------------
1 | 1 | | 1.1 | 2
2 | 1 | 1 | 1.1.1 | 0
3 | 1 | 1 | 1.1.2 | 0
4 | 1 | | 1.2 | 0
5 | 2 | | 2.1 | 1
6 | 2 | 5 | 2.1.1 | 0
*/