Home > Net >  sum greater than in subquery
sum greater than in subquery

Time:01-03

I'm making a database in PostgreSQL that involves around democracy. All data that should be displayed are controlled by the users, and their percentage of power.

I'm struggling to write this SQL query where a collection of tags on a post should only be shown once the sum of all the percentage for each tag reaches a certain criteria.

The relations between the tables (relevant to this question) looks like this:

Image of relations between the tables

The post_tags table is used for deciding what tag stays on what post, decided by the users based on their percentage.

It may look something like this

approved_by post_id tag_id percentage
1 1 1 0.33
5 1 3 0.45
7 1 3 0.25
6 1 3 0.15
4 1 1 0.90
1 1 2 0.45
1 1 6 -0.60
6 1 2 -0.15

How do you write an SQL query that selects a post and its tags if the percentage sum is above a certain threshold?

In the case of SUM(post_tags.percentage) > 0.75, only tag with tag_id 1 and 3 should show.

So far, I have written this SQL query, but it contains duplicates in the array_agg (might be a separate issue), and the HAVING only seem to depend on the total sum of all the tags in the array_agg.

SELECT
    posts.post_id, array_agg(tags.name) AS tags
FROM
    posts, tags, post_tags
WHERE
    post_tags.post_id = posts.post_id AND
    post_tags.tag_id = tags.tag_id
GROUP BY
    posts.post_id
HAVING
    SUM(post_tags.percentage) > 0.75
LIMIT 10;

I assume I might need to do a subquery within the query, but you can't do SUM inside the WHERE clause. I'm a bit lost at this issue.

Any help is appreciated


UPDATE

Because I think there needs to be atleast 2 queries into play, I think this should be one of them

SELECT
    tags.name
FROM
    post_tags, posts, tags
WHERE
    post_tags.tag_id = tags.tag_id AND
    post_tags.post_id = posts.post_id AND
    posts.post_id = 1
GROUP BY
    tags.tag_id
HAVING
    SUM(post_tags.percentage) > 0.75

In this case, it's only for post 1, and I don't know how to continue this query for all posts

CodePudding user response:

It's easy to get confused, but start small, and then expand the SQL query as you go.

Note the inner parenthesis will execute first. Start with the inner query, and then work on the outer query when building SQL queries.

In this case, for finding the tags relevant for a single post can be written like so

SELECT
    t.name
FROM
    post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = 1
GROUP BY
    t.tag_id
HAVING
    SUM(post_tags.percentage) > 0.75

To expand on this, and apply the query for every post, switch out the 1 and set it equal the outer scope. The complete SQL query becomes this:

SELECT p.post_id, ARRAY(
    SELECT
        t.name
    FROM
        post_tags
    INNER JOIN tags t ON t.tag_id = post_tags.tag_id
    INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = p.post_id
    GROUP BY
        t.tag_id
    HAVING
        SUM(post_tags.percentage) > 0.75
) AS tags
FROM posts p

Big thanks to Tim Biegeleisen who helped change out the FROM statements to INNER JOIN (tho performance-wise, both are tested equally fast in this case).

CodePudding user response:

One idea would be to first aggregate the total percentage for each post/tag pair in a sub query, and then limit the result set by the minimal percentage in the outer select.

A basic example (without joining tags and posts nor aggregating posts into an array) might look something like this:

SELECT agg.post_id, agg.tag_id FROM 
  (
    SELECT post_id, tag_id, sum(percentage) as percentage_sum 
    FROM post_tags
    GROUP BY post_id, tag_id
  ) AS agg
WHERE agg.percentage_sum > 0.75

This basically gives you a new join table, reduced to one entry per post and tag, just including the tags that you are interested in.

To actually select the the tag names per post and group it into an array, the above query can be expanded like this:

SELECT agg.post_id, array_agg(t.name) AS tags FROM 
  (
    SELECT post_id, tag_id, sum(percentage) as percentage_sum 
    FROM post_tags
    GROUP BY post_id, tag_id
  ) AS agg
LEFT JOIN tags t ON t.id = agg.tag_id
WHERE agg.percentage_sum > 0.75
GROUP BY agg.post_id

The approach is a bit different from what you initially asked for, but it should give the same result.

  • Related