Home > Software engineering >  How to compute cumulative sum of a count field in Django
How to compute cumulative sum of a count field in Django

Time:09-01

I have a model that register some kind of event and the date in which it occurs. I need to calculate: 1) the count of events for each date, and 2) the cumulative count of events over time.

My model looks something like this:

class Event(models.Model):
  date = models.DateField()
  ...

Calculating 1) is pretty straightforward, but I'm having trouble calculating the cumulative sum. I tried something like this:

query_set = Event.objects.values("date") \
                         .annotate(count=Count("date")) \
                         .annotate(cumcount=Window(Sum("count"), order_by="date"))              

But I'm getting this error:

Cannot compute Sum('count'): 'count' is an aggregate

Edit: Ideally, I'd like to have a query set equivalent to this SQL query, using Django's ORM:

SELECT date,
       COUNT(date) as count,
       SUM(COUNT(date)) OVER(ORDER BY date) acc_count 
FROM event_event
GROUP BY date

CodePudding user response:

It's surprisingly common to see developers wanting to convert from SQL query to Django QuerySet.

In this case, as OP already knows SQL, OP might be better off just performing raw SQL query.

There's different ways one can go about doing it, like executing custom SQL directly.

from django.db import connection

def my_custom_sql(self):
    with connection.cursor() as cursor:
        cursor.execute("SELECT date, COUNT(date) as count, SUM(COUNT(date)) OVER(ORDER BY date) acc_count 
                        FROM event_event
                        GROUP BY date")

Then, call cursor.fetchone() or cursor.fetchall() to return the resulting rows.

CodePudding user response:

In some cases perform an aggregate of an aggregate are not valid in SQL, whether you're using the ORM or not, for example: MAX(SUM(...)). In your case you can do it by a raw query (as already mentioned in the other answer(s) and in your query). Or using the ORM as following:

subquery = (
    Event.objects.filter(date=OuterRef("date"))  # we need this for the join
    .values("date")  # this to create the group by
    .annotate(subcount=Count("date"))  # the aggregate function
)
Event.objects.values("date").annotate(count=Count("date")).annotate(
    sumcount=Window(Sum(subquery.values("subcount")), order_by="date")
    # above we can use the Sum with the subquery
    # we can also replace it for any aggregation functions that we want
).values("date", "count", "cumcount")

It will generate the following SQL:

SELECT
    "app_event"."date",
    COUNT("app_event"."date") AS "count",
    SUM((SELECT
            COUNT(U0."date") AS "subcount" 
        FROM
            "app_event" U0 
        WHERE
            U0."date" = ("app_event"."date") 
        GROUP BY
            U0."date"
    )) OVER ( ORDER BY "app_event"."date")
    AS "cumcount" 
FROM
    "app_event" 
GROUP BY
    "app_event"."date"
  • Related