I'm making a database in PostgreSQL that involves around democracy. All data that should be displayed are controlled by the users, and their percentage of power.
I'm struggling to write this SQL query where a collection of tags
on a post
should only be shown once the sum of all the percentage for each tag reaches a certain criteria.
The relations between the tables (relevant to this question) looks like this:
The post_tags
table is used for deciding what tag stays on what post, decided by the users based on their percentage.
It may look something like this
approved_by | post_id | tag_id | percentage |
---|---|---|---|
1 | 1 | 1 | 0.33 |
5 | 1 | 3 | 0.45 |
7 | 1 | 3 | 0.25 |
6 | 1 | 3 | 0.15 |
4 | 1 | 1 | 0.90 |
1 | 1 | 2 | 0.45 |
1 | 1 | 6 | -0.60 |
6 | 1 | 2 | -0.15 |
How do you write an SQL query that selects a post and its tags if the percentage sum is above a certain threshold?
In the case of SUM(post_tags.percentage) > 0.75
, only tag with tag_id
1 and 3 should show.
So far, I have written this SQL query, but it contains duplicates in the array_agg
(might be a separate issue), and the HAVING
only seem to depend on the total sum of all the tags in the array_agg
.
SELECT
posts.post_id, array_agg(tags.name) AS tags
FROM
posts, tags, post_tags
WHERE
post_tags.post_id = posts.post_id AND
post_tags.tag_id = tags.tag_id
GROUP BY
posts.post_id
HAVING
SUM(post_tags.percentage) > 0.75
LIMIT 10;
I assume I might need to do a subquery within the query, but you can't do SUM
inside the WHERE
clause. I'm a bit lost at this issue.
Any help is appreciated
UPDATE
Because I think there needs to be atleast 2 queries into play, I think this should be one of them
SELECT
tags.name
FROM
post_tags, posts, tags
WHERE
post_tags.tag_id = tags.tag_id AND
post_tags.post_id = posts.post_id AND
posts.post_id = 1
GROUP BY
tags.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
In this case, it's only for post 1, and I don't know how to continue this query for all posts
CodePudding user response:
It's easy to get confused, but start small, and then expand the SQL query as you go.
Note the inner parenthesis will execute first. Start with the inner query, and then work on the outer query when building SQL queries.
In this case, for finding the tags relevant for a single post can be written like so
SELECT
t.name
FROM
post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = 1
GROUP BY
t.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
To expand on this, and apply the query for every post, switch out the 1
and set it equal the outer scope. The complete SQL query becomes this:
SELECT p.post_id, ARRAY(
SELECT
t.name
FROM
post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = p.post_id
GROUP BY
t.tag_id
HAVING
SUM(post_tags.percentage) > 0.75
) AS tags
FROM posts p
Big thanks to Tim Biegeleisen who helped change out the FROM
statements to INNER JOIN
(tho performance-wise, both are tested equally fast in this case).
CodePudding user response:
One idea would be to first aggregate the total percentage for each post/tag pair in a sub query, and then limit the result set by the minimal percentage in the outer select.
A basic example (without joining tags and posts nor aggregating posts into an array) might look something like this:
SELECT agg.post_id, agg.tag_id FROM
(
SELECT post_id, tag_id, sum(percentage) as percentage_sum
FROM post_tags
GROUP BY post_id, tag_id
) AS agg
WHERE agg.percentage_sum > 0.75
This basically gives you a new join table, reduced to one entry per post and tag, just including the tags that you are interested in.
To actually select the the tag names per post and group it into an array, the above query can be expanded like this:
SELECT agg.post_id, array_agg(t.name) AS tags FROM
(
SELECT post_id, tag_id, sum(percentage) as percentage_sum
FROM post_tags
GROUP BY post_id, tag_id
) AS agg
LEFT JOIN tags t ON t.id = agg.tag_id
WHERE agg.percentage_sum > 0.75
GROUP BY agg.post_id
The approach is a bit different from what you initially asked for, but it should give the same result.