Calculate monthwise data in django/python-CodePudding

I am getting input as start_date and end_date, now I want to fetch data based on months between start_date and end_date. How that can be done in python/django?

I am getting date in format as -

start_date = '2021-5-5'   #YYYY-MM-DD format
end_date = '2021-6-5'

Required results -

result = [
    {
        'month' : 'may',  
        'data' : data_for_may  # from date 5th of may to 31st of may
    },
    {
        'month' : 'june',
        'data' : data_for_june  # from date 1st of june to 5th of june
    }
]

CodePudding user response：

First, you need to convert your strings into dates :

start_date = datetime.strptime(start_date , '%Y-%m-%d')
end_date = datetime.strptime(end_date , '%Y-%m-%d')

Then I'm thinking of something like this but I didn't test it.

all_data = User.objects.filter(last_login__date__range=[start_date, end_date])
results = []
year = start_date.year
month = start_date.month
while year <= end_date.year and month <= end_date.month:
    results.append({
        'month': month,  # You could make the conversion from the number of the month to the name of the month
        'data': all_data.filter(last_login__date__month=month, last_login__date__year=year)
    })
    month  = 1
    if month == 13:
        year  = 1
        month = 0

CodePudding user response：

I think you're better off doing:

from datetime import datetime
from django.db.models import Count
from django.contrib.auth.models import User

start_date = datetime.strptime('2021-5-5' , '%Y-%m-%d')
month_end_date = datetime.strptime('2021-6-5' , '%Y-%m-%d')

# SELECT year(last_login), month(last_login), count(*) 
# FROM auth_user 
# GROUP BY year(last_login), month(last_login)
# ORDER BY year(last_login), month(last_login)
qs = (User.objects.values('last_login__month', 'last_login__year')
                  .annotate(data=Count('*'))
                  .order_by('last_login__year', 'last_login__month'))
# WHERE last_login ...
qs = qs.filter(last_login__range=[start_date, month_end_date])

result = []
for item in qs:
    result.append({
        # get pretty name i.e "January"
        'month': datetime(1900, item['last_login__month'] , 1).strftime('%B'),
        'data': item['data']
    })

result  # [{'month': 'May', 'data': 81}, {'month': 'June', 'data': 15}])

Why do I think this is better? (over the other answers provided)

You will only have 1 record PER month PER year, easy to quantify/predict, better on performance.

I wrote tests for you by the way ;)

https://gist.github.com/kingbuzzman/0197da03c52ae9a798c99d0cf58c758c#file-month_data-py-L82-L133

As a comment inside the gist, I provide examples on how to test it using docker

CodePudding user response：

Depending on how much data you have, I would fetch all data in a single query, ordered by datetime, and then group them in Python. The following snippet illustrates that idea.

from itertools import groupby

data = User.objects.all().order_by('last_login')

result = []
for (year, month), data_per_month in groupby(data, key=lambda x: (x.last_login.year(), x.last_login.month())):
    result.append({
        'year': year,
        'month': month,
        'data': data_per_month   
    })

This will probably be fast and easily fitting in memory with 10,000s of objects. When fetching millions of records though, you might need to reconsider.