I am getting input as start_date and end_date, now I want to fetch data based on months between start_date and end_date. How that can be done in python/django?
I am getting date in format as -
start_date = '2021-5-5' #YYYY-MM-DD format
end_date = '2021-6-5'
Required results -
result = [
{
'month' : 'may',
'data' : data_for_may # from date 5th of may to 31st of may
},
{
'month' : 'june',
'data' : data_for_june # from date 1st of june to 5th of june
}
]
CodePudding user response:
First, you need to convert your strings into dates :
start_date = datetime.strptime(start_date , '%Y-%m-%d')
end_date = datetime.strptime(end_date , '%Y-%m-%d')
Then I'm thinking of something like this but I didn't test it.
all_data = User.objects.filter(last_login__date__range=[start_date, end_date])
results = []
year = start_date.year
month = start_date.month
while year <= end_date.year and month <= end_date.month:
results.append({
'month': month, # You could make the conversion from the number of the month to the name of the month
'data': all_data.filter(last_login__date__month=month, last_login__date__year=year)
})
month = 1
if month == 13:
year = 1
month = 0
CodePudding user response:
I think you're better off doing:
from datetime import datetime
from django.db.models import Count
from django.contrib.auth.models import User
start_date = datetime.strptime('2021-5-5' , '%Y-%m-%d')
month_end_date = datetime.strptime('2021-6-5' , '%Y-%m-%d')
# SELECT year(last_login), month(last_login), count(*)
# FROM auth_user
# GROUP BY year(last_login), month(last_login)
# ORDER BY year(last_login), month(last_login)
qs = (User.objects.values('last_login__month', 'last_login__year')
.annotate(data=Count('*'))
.order_by('last_login__year', 'last_login__month'))
# WHERE last_login ...
qs = qs.filter(last_login__range=[start_date, month_end_date])
result = []
for item in qs:
result.append({
# get pretty name i.e "January"
'month': datetime(1900, item['last_login__month'] , 1).strftime('%B'),
'data': item['data']
})
result # [{'month': 'May', 'data': 81}, {'month': 'June', 'data': 15}])
Why do I think this is better? (over the other answers provided)
You will only have 1 record PER month PER year, easy to quantify/predict, better on performance.
I wrote tests for you by the way ;)
https://gist.github.com/kingbuzzman/0197da03c52ae9a798c99d0cf58c758c#file-month_data-py-L82-L133
As a comment inside the gist, I provide examples on how to test it using docker
CodePudding user response:
Depending on how much data you have, I would fetch all data in a single query, ordered by datetime, and then group them in Python. The following snippet illustrates that idea.
from itertools import groupby
data = User.objects.all().order_by('last_login')
result = []
for (year, month), data_per_month in groupby(data, key=lambda x: (x.last_login.year(), x.last_login.month())):
result.append({
'year': year,
'month': month,
'data': data_per_month
})
This will probably be fast and easily fitting in memory with 10,000s of objects. When fetching millions of records though, you might need to reconsider.