Home > Net >  Nesting Django QuerySets
Nesting Django QuerySets

Time:01-27

Is there a way to create a queryset that operates on a nested queryset?

The simplest example I can think of to explain what I'm trying to accomplish is by demonstration.

I would like to write code something like

SensorReading.objects.filter(reading=1).objects.filter(meter=1)

resulting in SQL looking like

SELECT * FROM (
    SELECT * FROM SensorReading WHERE reading=1
) WHERE sensor=1;

More specifically I have a model representing readings from sensors

class SensorReading(models.Model):
    sensor=models.PositiveIntegerField()
    timestamp=models.DatetimeField()
    reading=models.IntegerField()

With this I am creating a queryset that annotates every sensor with the elapsed time since the previous reading in seconds

readings = (
    SensorReading.objects.filter(**filters)
    .annotate(
        previous_read=Window(
            expression=window.Lead("timestamp"),
            partition_by=[F("sensor"),],
            order_by=["timestamp",],
            frame=RowRange(start=-1, end=0),
        )
    )
    .annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
)

I now want to aggregate those per sensor to see the minimum and maximum elapsed time between readings from every sensor. I initially tried

readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))[0]

however, this fails because window values cannot be used inside the aggregate.

Are there any methods or libraries to solve this without needing to resort to raw SQL? Or have I just overlooked a simpler solution to the problem?

CodePudding user response:

The short answer is Yes you can, using the id__in lookup and a subquery in the filter method. function from the django.db.models module.

The long answer is how? :

you can create a subquery that retrieves the filtered SensorReading objects, and then use that subquery in the main queryset (For example):

from django.db.models import Subquery

subquery = SensorReading.objects.filter(reading=1).values('id')
readings = SensorReading.objects.filter(id__in=Subquery(subquery), meter=1)

The above code will generate SQL that is similar to what you described in your example:

SELECT * FROM SensorReading
WHERE id IN (SELECT id FROM SensorReading WHERE reading=1)
AND meter=1;

Another way is to chain the filter() on the queryset that you have created, and add the second filter on top of it

readings = (
    SensorReading.objects.filter(**filters)
    .annotate(
        previous_read=Window(
            expression=window.Lead("timestamp"),
            partition_by=[F("sensor"),],
            order_by=["timestamp",],
            frame=RowRange(start=-1, end=0),
        )
    )
    .annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
    .filter(sensor=1)
)

UPDATE: As you commented below, you can use the RawSQL function from the django.db.models module to to aggregate the window function values without running the subquery multiple times. This allows you to include raw SQL in the queryset, and use the results of that SQL in further querysets or aggregations.

For example, you can create a raw SQL query that retrieves the filtered SensorReading objects, the previous_read and delta fields with the window function applied, and then use that SQL in the main queryset:

from django.db.models import RawSQL

raw_sql = '''
    SELECT id, sensor, timestamp, reading,
        LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp) as previous_read,
        ABS(EXTRACT(EPOCH FROM timestamp - LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp))) as delta
    FROM myapp_sensorreading
    WHERE reading = 1
'''
readings = SensorReading.objects.raw(raw_sql)

You can then use the readings queryset to aggregate the data as you need, for example:

aggregated_data = readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))

Just be aware of the security implications of using raw SQL, as it allows you to include user input directly in the query, which could lead to SQL injection attacks. Be sure to properly validate and sanitize any user input that you use in a raw SQL query.

CodePudding user response:

Ended up rolling my own solution, basically introspecting the queryset to create a fake table to use in the creation of a new query set and setting the alias to a node that knows to render the SQL for the inner query

allows me to do something like

readings = (
    NestedQuery(
        SensorReading.objects.filter(**filters)
        .annotate(
            previous_read=Window(
                expression=window.Lead("timestamp"),
                partition_by=[F("sensor"),],
                order_by=[
                    "timestamp",
                ],
                frame=RowRange(start=-1, end=0),
            )
        )
        .annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
    )
    .values("sensor")
    .annotate(min=Min("delta"), max=Max("delta"))
)

code is available on github, and I've published it on pypi

https://github.com/Simage/django-nestedquery

I have no doubt that I'm leaking the tables or some such nonsense still and this should be considered proof of concept, not any sort of production code.

  • Related