I'm trying to wrap my head around the distinct method of the django queryset class, the thing I'm having trouble understanding is when to actually use it. note that I'm not talking about the "distinct on" feature of postgres.
I understand that each model instance has to have an id property and ids are unique so when you are querying model instances it's not really possible to get duplicate models/rows. so is the following use of distinct redundant?
User.objects.distict()
I know that one correct use of the distinct method is when you use the values method and you don't select the id, you might have values that are duplicate and you could use distinct in these scenarios.
is there any other scenario where one might need to use distinct (e.g. when using select_related or prefetch_related)?
CodePudding user response:
A common use of distinct()
is to eliminate duplicates when you filter across multiple tables
Consider the following models
class Parent(models.Model):
pass
class Child(models.Model):
parent = models.ForeignKey(Parent, on_delete=models.CASCADE)
value = models.IntegerField()
Populated with the following data
p = Parent.objects.create()
Child.objects.create(parent=p, value=1)
Child.objects.create(parent=p, value=2)
Child.objects.create(parent=p, value=3)
If you filter a Parent
queryset by the related value
column then you will get a duplicate for every Child
that matches the filter
Parent.objects.filter(child__value__gt=0)
# <QuerySet [<Parent: Parent object (1)>, <Parent: Parent object (1)>, <Parent: Parent object (1)>]>
Parent.objects.filter(child__value__gt=1)
# <QuerySet [<Parent: Parent object (1)>, <Parent: Parent object (1)>]>
But if you use distinct()
then the duplicates will be eliminated
Parent.objects.filter(child__value__gt=0).distinct()
# <QuerySet [<Parent: Parent object (1)>]>