Home > Back-end >  Efficiently get first and last model instances in Django Model with timestamp, by day
Efficiently get first and last model instances in Django Model with timestamp, by day

Time:09-09

Suppose you have this model:

from django import models
from django.contrib.postgres.indexes import BrinIndex


class MyModel(model.Models):
  device_id = models.IntegerField()
  timestamp = models.DateTimeField(auto_now_add=True)
  my_value = models.FloatField()

  class Meta:
    indexes = (BrinIndex(fields=['timestamp']),)

There is a periodic process that creates an instance of this model every 2 minutes or so. This process is supposed to run for years, with multiple devices, so this table will contain a great number of records.

My goal is, for each day when there are records, to get the first and last records in that day.

So far, what I could come up with is this:

from django.db.models import Min, Max


results = []
device_id = 1 # Could be other device id, of course, but 1 for illustration's sake

# This will get me a list of dictionaries that have first and last fields 
# with the desired timestamps, but not the field my_value for them.

first_last = MyModel.objects.filter(device_id=device_id).values('timestamp__date')\
.annotate(first=Min('timestamp__date'),last=Max('timestamp__date'))

# So now I have to iterate over that list to get the instances/values
  
for f in first_last:

    first = f['first']
    last = f['last']

    first_value = MyModel.objects.get(device=device, timestmap=first).my_value
    last_value = MyModel.objects.get(device=device, timestamp=last).my_value

    results.append({
      'first': first,
      'last': last,
      'first_value': first_value,
      'last_value': last_value,
    })

# Do something with results[]

This works, but takes a long time (about 50 seconds on my machine, retrieving first and last values for about 450 days).

I have tried other combinations of annotate(), values(), values_list(), extra() etc, but this is the best I could come up with so far.

Any help or insight is appreciated!

CodePudding user response:

You can take advantage of .distinct() if you are using PostgreSQL as DBMS.

first_models = MyModel.objects.order_by('timestamp__date', 'timestamp').distinct('timestamp__date')
last_models = MyModel.objects.order_by('timestamp__date', '-timestamp').distinct('timestamp__date')
first_last = first_models.union(last_models)

# do something with first_last

One more things need to be mentioned: first_last might eliminate duplicate when there is only one record for a date. It should not be a problem for you, but if it does, you can iterate first_models and last_models separately.

  • Related