I am trying to build a search function for my blog that searches through the two models Articles
and Spots
. The two models are connected via the pivot table ArticleSpots
.
My blog is structured so that there are multiple spots in each article.
When a query is searched I want the query to be searched within both models but only display clickable articles. I have a html page for each article but not for each spots, so all the spots that resulted from the search have to be shown as the article which contains the spot. Hope that makes sense!
This is the code that I came up with but the problem is that I get a lot of duplicates in the variable results
. There are duplicates within each articles_from_spots
and articles_from_query
, and there are also overlaps between them.
Is this the right way to accomplish this ? How can I remove the duplicates from the results? Any help would be appreciated!
views.py
def search(request):
query = request.GET.get("q")
articles_from_query = Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query)
)
spots_from_query = Spots.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(content__icontains=query)
)
articles_from_spots = []
for x in spots_from_query:
article = Articles.objects.filter(articlespots__spot=x)
articles_from_spots.extend(article)
results = chain(articles_from_spots, articles_from_query)
context = {
'results': results,
}
return render(request, "Search.html", context)
models.py
class Articles(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
class ArticleSpots(models.Model):
article = models.ForeignKey('Articles', models.DO_NOTHING)
spot = models.ForeignKey('Spots', models.DO_NOTHING)
class Spots(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
content = models.TextField(blank=True, null=True)
CodePudding user response:
You should be able to do this in a single query following the relationship from article to spot
Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(articlespots__spot__title__icontains=query) |
Q(articlespots__spot__summary__icontains=query) |
Q(articlespots__spot__content__icontains=query)
).distinct()
If you were to add a ManyToManyField from Article to Spots it would simplify this a bit and makes sense from a design POV
class Articles(models.Model):
...
spots = models.ManyToManyField('Spots', through='ArticleSpots')
Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(spots__title__icontains=query) |
Q(spots__summary__icontains=query) |
Q(spots__content__icontains=query)
).distinct()
CodePudding user response:
The main issue is the inefficient for-loop, but I have to suggest something else first.
I would highly recommend a model design change:
class Articles(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
spots = models.ManyToManyField(Spot, blank=True, related_name='articles')
class Spots(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
content = models.TextField(blank=True, null=True)
The function is exactly the same (plus you can call spot.articles.all()
and article.spots.all()
). You can still access your ArticleSpots
model as Article.spots.through
if you need. In case you later need more fields per connection, you can do this (together with your original ArticleSpots class, maybe with on_delete=models.CASCADE
there instead):
spots = models.ManyToManyField(Spot, blank=True, through= 'ArticleSpots')
The for-loop is inefficient (think dozens of seconds when you get to just thousands of objects or if joins happen), because it triggers a query for every item in the result. Instead you should get the articles_from_spots
with a direct query. IE.:
article_ids = spots_from_query.values_list('articles__id', flat=True)
articles_from_spots = Article.objects.filter(id__in=article_ids)
That will guarantee only 2 db queries per run. Then you'll need to do something like to turn the querysets into lists before combining them:
results = chain(map(list, [articles_from_spots, articles_from_query]))
There might still be issues with mixing two model querysets together, but that all depends on your template. It's generally a bad practice, but there's no acute issue as far as you're aware of it.