I have a recurring problem in SQL queries, that I haven't been able to solve elegantly, neither in raw SQL or the Django ORM, and now I'm faced with it in EntityFramework as well. It is probably common enough to have its own name, but I don't know it.
Say, I have a simple foreign key relationship between two tables, e.g.
Book 1 <- * Tag
A book has many tags and a tag has one book, i.e. the Tag table has a foreign key to the book table.
Now, I want to find all books that have "Tag1" and "Tag2".
Raw SQL
I can make multiple joins
SELECT * FROM books
JOIN tags t1 on tags.book_id = books.id
JOIN tags t2 on tags.book_id = books.id
WHERE t1.tag = 'Tag1' AND t2.tag = 'Tag2'
Cool, that works, but doesn't really seem performant
Django
In django, I could do something similar
Book.objects.filter(tags__tag="Tag1").filter(tags__tag="Tag1")
Changing filters like that will cause the extra joins, like in the raw SQL version
EntityFramework LINQ
I tried chaining .Where()
similar to changing Django's .filter()
, but that does not have the same result. It will build a query resembling the following, which will of course return nothing, because there is no row where the tag are two different strings
SELECT * FROM books
JOIN tags t1 on tags.book_id = books.id
WHERE t1.tag = 'Tag1' AND t1.tag = 'Tag2'
Wrapping it up
I suppose I could do an array aggregate to aggregate tags into and array and compare to that, but that seems expensive too, and aggregates and grouping also have impact on the ordering of things, which forces me to do subqueries to get the order I want.
I am by no means an expert in SQL, as you can plainly see, but I guess what I am hoping for is either
- A way to mimic the stupid nonsense above in LINQ
- An alternative, more elegant approach that will let me do what I need and which works well with any ORM
Extra ramblings
This case where I need to find books that have "all of" a list of tags is the tricky bit... If it was "any of" or "this particular one", then it would be simple.
EDIT: The solution using arrays and overlap
In Postgres, we can do array_agg
to aggregate all related tags into an array, like this:
SELECT * FROM books
JOIN tags t1 on tags.book_id = books.id
;
-------- ------- ------
| BookId | Name | Tag |
-------- ------- ------
| 1 | BookA | Tag1 |
| 1 | BookA | Tag2 |
| 1 | BookA | Tag3 |
| 2 | BookB | Tag1 |
| 2 | BookB | Tag3 |
-------- ------- ------
SELECT books.BookId, Name, array_agg(t1.tags) as tags
FROM books
JOIN tags t1 on tags.book_id = books.id
GROUP BY BookId
ORDER BY BookId
;
-------- ------- --------------------
| BookId | Name | tags |
-------- ------- --------------------
| 1 | BookA | [Tag1, Tag2, Tag3} |
| 2 | BookB | {Tag1, Tag3} |
-------- ------- --------------------
With that, I can then use the array "contains" operator to find the row where tag
overlaps with the expected set: WHERE tags @> ('Tag1', 'Tag2')
.
This is also a viable option. It does aggregation instead of excessive joining. Not sure what that would look like with LINQ query though
CodePudding user response:
With group by
and having
, we can do Tag 1
AND Tag 2
:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(*)=2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
EDIT:
- If
tag
intags
for abook_id
can be duplicated, use the following:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(distinct tag)=2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
- If looking for
tag1
ORtag2
, ortag1
ANDtag2
, use the following:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(distinct tag) between 1 and 2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
CodePudding user response:
If I understand it correctly you want books that only have tags 'Tag1' and 'Tag2'. I.e. no other tags. I'm not aware of an official name for this problem, maybe exclusive contains.
It amounts to finding books meeting two conditions:
- have all tags in ("Tag1", "Tag2")
- have two unique tags
Since you're looking for an Entity-Framework solution, here's the way to do it in LINQ:
var tags = new[] { "Tag1", "Tag2" };
var books = context.Books
.Where(b => b.Tags.All(t => tags.Contains(t.Tag))
&& b.Tags.Select(t.Tag).Distinct().Count == tags.Count());
The second condition is necessary, otherwise books without any tags would also be selected (that's the semantics of All
).