Home > database >  Django Queryset filtering based on each entry
Django Queryset filtering based on each entry

Time:04-27

Given the Django model below of a car traveling along a certain road with the start and end times:

class Travel(models.Model):
  car = models.CharField()
  road = models.CharField()
  start = models.DateTimeField()
  end = models.DateTimeField()

I want to identify the set of cars X that had been in the same road as the target car x for at least m minutes.

How should I obtain the required set of cars X?

My attempt:

So let's say I use filtering to obtain the set of travels T that x had been in.

T <-- Travel.objects.filter(car=x)

I then brute force with:

for t in T:
  possible_travels <-- filter Travel.objects with car=/=x, road=t.road, start < t.end, end > t.start 
  
  confirmed_travels <-- further filter possible_travels with the overlapping region being at least m minutes long

  confirmed_cars <-- confirmed_travels.values('cars').distinct()

However, the problems are:

  1. It may involve many DB hits by querying in a loop.
  2. Also, confirmed_cars gives a QuerySet object. So it seems I need to somehow append these QuerySet objects together. I saw other posts doing things like converting to list then appending and finally converting back to QuerySet but some people say it is not a good way, should I be doing something like this?

Are there some better approaches to this? Is a for loop really necessary and can I avoid it entirely?

Edit: for 2), I guess a way is to extract the car attribute and append into a list, then do .filter(car__in=the_list), it is not a super big issue compared to the rest.

CodePudding user response:

If you don't write a reusable library and are stuck for sure with some specific database backend, you can use raw SQL as well. It's not the best choice, but if performance of this query is a bottleneck - single query can be faster. Something like this (untested, but should work in postgres, needs to be slightly modified to work with other backends):

from django.db import connection
with connection.cursor() as cur:
    cur.execute('''
        SELECT others.car
        FROM {table} base
            INNER JOIN {table} as others
                on base.car <> others.car
                AND ((LEAST(base.end, others.end) - GREATEST(base.start, others.start)) > interval '%s minutes')
        WHERE base.car = %s
    '''.format(table=Travel._meta.db_table), (5, 'car1'))
    cur.fetchall()  # should return list of 1-tuples with proper cars

CodePudding user response:

We can have two approaches:

  1. Here is my solution with querysets and without any loops. In order to find intersection between start and end times, I partitioned querysets into 4 categories:
  • start1 -> start2 -> end1 -> end2 (q_slides_before) (intersection = end1 - start2)
  • start1 -> start2 -> end2 -> end1 (q_contains) (intersection = end2 - start2)
  • start2 -> start1 -> end1 -> end2 (q_is_contained) (intersection = end1 - start1)
  • start2 -> start1 -> end2 -> end1 (q_slides_after) (intersection = end2 - start1)
from django.db.models import OuterRef, ExpressionWrapper,\
                             F, Q, functions, DurationField, FloatField

# Use this function instead of "annotate_delta" if your db supports DurationField (postgres supports this but sqlite does not)
def annotate_delta_with_duration_support(qs, start, end):
    duration_exp = ExpressionWrapper(end - start, output_field=DurationField())
    return qs.annotate(delta=functions.ExtractMinute(duration_exp))


def annotate_delta(qs, start, end):
    duration_exp = ExpressionWrapper((end - start) / (60 * 10**6),
                                     output_field=FloatField())
    return qs.annotate(delta=duration_exp)


x = 'mycar'
m = 20

q_is_contained = Q(start__gte=OuterRef('start')) & Q(end__lte=OuterRef('end'))
qs = annotate_delta(Travel.objects, F('start'), F('end'))
qs = qs.filter(q_is_contained, delta__gte=m, car=x, road=OuterRef('road'))
res1 = Travel.objects.exclude(car=x).annotate(ex=Exists(qs)).filter(ex=True)


q_contains = Q(start__lte=OuterRef('start')) & Q(end__gte=OuterRef('end'))
qs = annotate_delta(Travel.objects, OuterRef('start'), OuterRef('end'))
qs = qs.filter(q_contains, delta__gte=m, car=x, road=OuterRef('road'))
res2 = Travel.objects.exclude(car=x).annotate(ex=Exists(qs)).filter(ex=True)


q_slides_before = Q(start__lte=OuterRef('start')) & \
                  Q(end__lte=OuterRef('end')) & \
                  Q(end__gte=OuterRef('start'))
qs = annotate_delta(Travel.objects, OuterRef('start'), F('end'))
qs = qs.filter(q_slides_before, delta__gte=m, car=x, road=OuterRef('road'))
res3 = Travel.objects.exclude(car=x).annotate(ex=Exists(qs)).filter(ex=True)


q_slides_after = Q(start__gte=OuterRef('start')) & \
                 Q(end__gte=OuterRef('end')) & \
                 Q(start__lte=OuterRef('end'))
qs = annotate_delta(Travel.objects, F('start'), OuterRef('end'))
qs = qs.filter(q_slides_after, delta__gte=m, car=x, road=OuterRef('road'))
res4 = Travel.objects.exclude(car=x).annotate(ex=Exists(qs)).filter(ex=True)

res = res1 | res2 | res3 | res4

  1. Thanks to @SUTerliakov, I found out that there is also a more concise solution:
from django.db.models import OuterRef, ExpressionWrapper,\
                             F, functions, DurationField, FloatField


# Use this function instead of "annotate_delta" if your db supports DurationField (postgres supports this but sqlite does not)
def annotate_delta_with_duration_support(qs, start, end):
    duration_exp = ExpressionWrapper(end - start, output_field=DurationField())
    return qs.annotate(delta=functions.ExtractMinute(duration_exp))


def annotate_delta(qs, start, end):
    duration_exp = ExpressionWrapper((end - start) / (60 * 10**6),
                                     output_field=FloatField())
    return qs.annotate(delta=duration_exp)


x = 'mycar'
m = 20

qs = annotate_delta(Travel.objects, functions.Greatest(F('start'), OuterRef('start')),
                    functions.Least(F('end'), OuterRef('end')))
qs = qs.filter(q_slides_after, delta__gte=m, car=x, road=OuterRef('road'))
res = Travel.objects.exclude(car=x).annotate(ex=Exists(qs)).filter(ex=True)

  • Related