Home > Blockchain >  Django: Calculate and Group inside Django ORM with multiple columns
Django: Calculate and Group inside Django ORM with multiple columns

Time:04-01

Good day,

right now I'm trying to improve my knowledge about Django ORM but struggling with the task below:

But first, the database looks like this:

class DeathsWorldwide(models.Model):
    causes_name = models.CharField(max_length=50, null=False, blank=False)
    death_numbers = models.PositiveIntegerField(default=0)
    country = models.CharField(max_length=50, null=True, blank=True)
    year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])


causes_name    |    death_numbers    |    country    |    year
Alcohol dis.   |    25430            |    Germany    |    1998
Poisoning      |    4038             |    Germany    |    1998
...
Maternal dis.  |    9452             |    Germany    |    1998
Alcohol dis.   |    21980            |    Germany    |    1999
Poisoning      |    5117             |    Germany    |    1999
...
Maternal dis.  |    8339             |    Germany    |    1999

Always a block of all diseases for each year, every country and so on...The range of years goes from 1990 to 2019.

What I - or rather let's say the task - want to achieve, is a list of all countries with calculated numbers of deaths, like this...

country    |    death_numbers
France     |    78012
Germany    |    70510
Austria    |    38025

...but with one additional feature: the number of deaths for each country between 1990-1999 must be subtracted from those of 2000-2019. So a full list would actually look something like this:

country    |    death_numbers    |    19xx    |    2xxx
France     |    78012            |    36913   |    114925
Germany    |    70510            |    ...     |    ...
Austria    |    38025            |    ...     |    ...

Is it possible to achieve such a result with only one query?

Thanks for your help and have a great day!

CodePudding user response:

Something like the following should do the trick.

from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)

It produces the following SQL:

SELECT 
    "base_deathsworldwide"."country", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" < 2000)
        AS "t19", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
        AS "t20", 
    (
        SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" >= 2000) 
        - SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" < 2000)
    ) AS "total" 
FROM "base_deathsworldwide" 
GROUP BY "base_deathsworldwide"."country"

It is one query, but repeats calculation. It looks like ORM doesn't support it, but we can try to build raw SQL with minimum efforts:

from django.db import connection  # replace it with connections if using multiple databases
from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'

with connection.cursor() as cursor:
    data = cursor.execute(template.format(sql), params).fetchall()

print(data)

I suggest measuring performance of both options and comparing. If difference is small (or the latter is smaller, which is also possible because plan optimization can take more time) or the former is fast enough for your use case, it's much better to stick with ORM solution.

  • Related