I have a hash table (for tagging items) with an extra column for the strength of the relationship between items and their tags:
item_id | tag_name | relationship
1 | linux | 0.7
1 | computer | 0.9
2 | garden | 0.5
2 | shovel | 0.65
...
I want to select all items tagged with both 'linux'
and 'computer'
(more tags are allowed), ordered by the sum of the relationships between items and these two tags.
Here's my attempt at querying the database:
SELECT * FROM items
INNER JOIN (
SELECT items.id FROM items
INNER JOIN tags ON items.id = tags.item_id
GROUP BY tags.item_id
HAVING ARRAY_AGG(tags.tag_id) @> ARRAY['linux','computer']
ORDER BY SUM(tags.relationship) DESC
) tagged_items ON items.id = tagged_items.id
For items that are only tagged with 'linux'
and 'computer'
, this works perfectly. But the problem is that for items that have additional tags, the ORDER BY SUM(tags.relationship)
will sum over the other tags that an item has as well... in other words, items with the most tags will return first (and that's not what I want).
How can I make SUM()
only sum over the relationships to the tags that I'm selecting?
CodePudding user response:
You can simply use in
:
select id
from items i
where tags_id in ('linux', 'computer')
group by id
order by sum(relationship) desc;
Unless you want all columns, then you can use the query above in cte and join:
with tmp as (
select id
from items
where tags_id in ('linux', 'computer')
group by id
order by sum(relationship) desc
)
select i.*
from tmp t
inner join items i on t.id = i.id;