Home > Back-end >  Django searching multiple models and removing duplicates
Django searching multiple models and removing duplicates

Time:11-09

I am trying to build a search function for my blog that searches through the two models Articles and Spots. The two models are connected via the pivot table ArticleSpots. My blog is structured so that there are multiple spots in each article.

When a query is searched I want the query to be searched within both models but only display clickable articles. I have a html page for each article but not for each spots, so all the spots that resulted from the search have to be shown as the article which contains the spot. Hope that makes sense!

This is the code that I came up with but the problem is that I get a lot of duplicates in the variable results. There are duplicates within each articles_from_spots and articles_from_query, and there are also overlaps between them.

Is this the right way to accomplish this ? How can I remove the duplicates from the results? Any help would be appreciated!

views.py

def search(request):

    query = request.GET.get("q")

    articles_from_query = Articles.objects.filter(
        Q(title__icontains=query) |
        Q(summary__icontains=query)
    )

    spots_from_query = Spots.objects.filter(
        Q(title__icontains=query) |
        Q(summary__icontains=query) |
        Q(content__icontains=query) 
    )

    articles_from_spots = []
    for x in spots_from_query:
        article = Articles.objects.filter(articlespots__spot=x)
        articles_from_spots.extend(article)

    results = chain(articles_from_spots, articles_from_query)

    context = {
        'results': results,
    }
    return render(request, "Search.html", context)

models.py

class Articles(models.Model):
    title = models.CharField(max_length=155)
    summary = models.TextField(blank=True, null=True)


class ArticleSpots(models.Model):
    article = models.ForeignKey('Articles', models.DO_NOTHING)
    spot = models.ForeignKey('Spots', models.DO_NOTHING)


class Spots(models.Model):
    title = models.CharField(max_length=155)
    summary = models.TextField(blank=True, null=True)
    content = models.TextField(blank=True, null=True)

CodePudding user response:

You should be able to do this in a single query following the relationship from article to spot

Articles.objects.filter(
    Q(title__icontains=query) |
    Q(summary__icontains=query) |
    Q(articlespots__spot__title__icontains=query) |
    Q(articlespots__spot__summary__icontains=query) |
    Q(articlespots__spot__content__icontains=query) 
).distinct()

If you were to add a ManyToManyField from Article to Spots it would simplify this a bit and makes sense from a design POV

class Articles(models.Model):
    ...
    spots = models.ManyToManyField('Spots', through='ArticleSpots')


Articles.objects.filter(
    Q(title__icontains=query) |
    Q(summary__icontains=query) |
    Q(spots__title__icontains=query) |
    Q(spots__summary__icontains=query) |
    Q(spots__content__icontains=query) 
).distinct()

CodePudding user response:

The main issue is the inefficient for-loop, but I have to suggest something else first.

I would highly recommend a model design change:

class Articles(models.Model):
    title = models.CharField(max_length=155)
    summary = models.TextField(blank=True, null=True)
    spots = models.ManyToManyField(Spot, blank=True, related_name='articles')


class Spots(models.Model):
    title = models.CharField(max_length=155)
    summary = models.TextField(blank=True, null=True)
    content = models.TextField(blank=True, null=True)

The function is exactly the same (plus you can call spot.articles.all() and article.spots.all()). You can still access your ArticleSpots model as Article.spots.through if you need. In case you later need more fields per connection, you can do this (together with your original ArticleSpots class, maybe with on_delete=models.CASCADE there instead):

    spots = models.ManyToManyField(Spot, blank=True, through= 'ArticleSpots')

The for-loop is inefficient (think dozens of seconds when you get to just thousands of objects or if joins happen), because it triggers a query for every item in the result. Instead you should get the articles_from_spots with a direct query. IE.:

article_ids = spots_from_query.values_list('articles__id', flat=True)
articles_from_spots = Article.objects.filter(id__in=article_ids)

That will guarantee only 2 db queries per run. Then you'll need to do something like to turn the querysets into lists before combining them:

results = chain(map(list, [articles_from_spots, articles_from_query]))

There might still be issues with mixing two model querysets together, but that all depends on your template. It's generally a bad practice, but there's no acute issue as far as you're aware of it.

  • Related