Home > Net >  Why does adding an extra column and another LEFT OUTER JOIN from another table cause the value in my
Why does adding an extra column and another LEFT OUTER JOIN from another table cause the value in my

Time:12-12

Problem

I'm building a reddit clone that allows users to create posts, vote on other people's posts, and leave comments on other people's posts. When I query the database for a list of all of the posts, I would like my query to return data like the example table below.

Anticipated Table

post_id Number of Upvotes Number of Downvotes Number of Comments
1 2 0 5
2 1 0 0
3 0 1 0
4 0 1 4

Unfortunately, I'm misunderstanding something about how I've set up my query. When I query the database, the result of my query looks something like this.

Unexpected Table

post_id Number of Upvotes Number of Downvotes Number of Comments
1 10 0 5
2 1 0 0
3 0 1 0
4 0 4 4

For some reason, the number of upvotes becomes equal to the number of comments on a post multiplied by the number of upvotes or downvotes. Below is the code I am using.

SQL Alchemy Query and SQL Version

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes"),
      func.count(models.Comment.post_id).label('num_comments')
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).join(
      models.Comment, models.Comment.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes,
count(comments.post_id) AS num_comments 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
LEFT OUTER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id

SQL Alchemy Models

class Post(Base):
  __tablename__ = "posts"

  id = Column(Integer, primary_key=True, nullable=False)
  title = Column(String, nullable=False)
  content = Column(String, nullable=False)
  published = Column(Boolean, server_default="True", nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)
  owner_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"), nullable=False)
  owner = relationship("User", backref="posts")


class User(Base): 
  __tablename__ = "users"

  id = Column(Integer, primary_key=True, nullable=False)
  username = Column(String, nullable=False, unique=True)
  email = Column(String, nullable=False, unique=True)
  password = Column(String, nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)


class Vote(Base):
  __tablename__ = "votes"

  user_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"), primary_key=True)
  user = relationship("User", backref="votes")
  post_id = Column(Integer, ForeignKey("posts.id", 
    ondelete="CASCADE"), primary_key=True)
  post = relationship("Post", backref="votes")
  upvote = Column(Boolean, nullable=False)

class Comment(Base):
  __tablename__ = "comments"

  id = Column(Integer, primary_key=True, nullable=False)
  owner_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"))
  owner = relationship("User", backref="comments")
  post_id = Column(Integer, ForeignKey("posts.id", 
    ondelete="CASCADE"))
  post = relationship("Post", backref="comments")
  content = Column(String, nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)

What I've Tried

First Query

I've tried removing the second Left Join on the comments table and the query that attempts to count the number of comments. The following query makes the number of upvotes and downvotes act correctly.

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes"),
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes, 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
post_id Number of Upvotes Number of Downvotes
1 2 0
2 1 0
3 0 1
4 0 1

Second Query

I've also tried only removing the function that counts the number of comments while leaving the second LEFT OUTER JOIN in place. When I implement the following query, my table looks like this:

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes")
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).join(
      models.Comment, models.Comment.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
LEFT OUTER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id
post_id Number of Upvotes Number of Downvotes
1 10 0
2 1 0
3 0 1
4 0 4

This leads me to believe the issue is with the second LEFT OUTER JOIN rather than the other counting function.

When I attempt to add the second LEFT OUTER JOIN to my query, the count that keeps track of the number of upvotes and downvotes returns an incorrect value. Would anyone be able to explain why my query is giving me the Unexpected Table above and what I would need to change to return the Anticipated Table instead? Thank you in advance for taking the time to read through this post. Please let me know if I could be clearer in any places

CodePudding user response:

I don't know Alchemy, so here is my answer on SQL:

Your mistake is that you cross join a post's votes with its comments. If you have two votes and three comments on a post, your joins create all six combinations (2 x 3 = 6). Then you count, but as you count in the cross joined result, you are counting votes and comments multifold.

What you want to do instead is join vote counts and comment counts to the posts:

SELECT 
  p.id AS posts_id,
  COALESCE(v.cnt_up, 0) AS num_upvotes,
  COALESCE(v.cnt_down, 0) AS num_downvotes,
  COALESCE(c.cnt, 0) AS num_comments 
FROM posts p
LEFT OUTER JOIN
(
  SELECT
    post_id,
    COUNT(*) FILTER (WHERE upvote = true) AS cnt_up,
    COUNT(*) FILTER (WHERE upvote = false) AS cnt_down
  FROM votes
  GROUP BY post_id
) v ON v.post_id = p.id
LEFT OUTER JOIN
(
  SELECT post_id, COUNT(*) AS cnt
  FROM comments
  GROUP BY post_id
) c ON c.post_id = p.id
ORDER BY p.id;

Well, I probably shouldn't mention this, as it is considered bad practise, due to creating an unnecessarily large intermediate result and being prone to errors, when trying to add other aggregation results, but well, here it is: As long as the only aggregation function you use is COUNT, you can use your query with distinct ID counts:

SELECT
  p.id AS posts_id,
  COUNT(DISTINCT v.id) FILTER (WHERE v.upvote = true) AS num_upvotes,
  COUNT(DISTINCT v.id) FILTER (WHERE v.upvote = false) AS num_downvotes,
  COUNT(DISTINCT c.id) AS num_comments 
FROM posts p 
LEFT OUTER JOIN votes v ON voves.post_id = p.id
LEFT OUTER JOIN comments c ON c.post_id = p.id
GROUP BY p.id
ORDER BY p.id;
  • Related