Background:
I am using pandas to append to an already-existing table in a SQL database. The issue is that the sql database I am connecting to is larger than what pandas can handle. To get around this, I am going to access the table in chunks. The SQL query will contain a WHERE clause, and I want to iterate through it using a list of date ranges.
This is my example SQL query (simplified):
SELECT * FROM (SELECT col1, col2, col3 ... FROM table)
WHERE DATE(server_time) >= min_date AND DATE(server_time) <= max_date
I want to iterate through a list of date ranges to call this query This is how the server_time field is formatted:
'2022-03-29 20:08:57'
So in order to filter the dates, I need to create this list of date ranges, and I am struggling with this.
This is what I tried:
num_months = 3
today = datetime.datetime.today()
date_list = [today - datetime.timedelta(months=x) for x in range(num_months)]
I tried creating a list of date ranges varying by days instead of months, and it worked. I did this by changing (months=x)
to (days=x)
.
When I run the code above, however, I get 'months' is an invalid keyword argument for __new__()
. Any ideas? Thanks so much in advance!!
CodePudding user response:
With pandas, you could use:
import numpy as np
N = 5 # number of values
pd.Timestamp('today')-pd.DateOffset(months=1)*np.arange(N)
output:
array([Timestamp('2022-07-25 01:23:45'),
Timestamp('2022-06-25 01:23:45'),
Timestamp('2022-05-25 01:23:45'),
Timestamp('2022-04-25 01:23:45'),
Timestamp('2022-03-25 01:23:45')], dtype=object)
As Series:
N = 5
pd.Series(pd.Timestamp('2022-07-25 01:23:45')-pd.DateOffset(months=1)*np.arange(5))
output:
0 2022-07-25 01:23:45
1 2022-06-25 01:23:45
2 2022-05-25 01:23:45
3 2022-04-25 01:23:45
4 2022-03-25 01:23:45
dtype: datetime64[ns]
programatically determine the number of periods:
going up to the END date (but not before):
START = pd.Timestamp('today')
END = pd.Timestamp('2022-01-01')
N = (START.to_period('M')-END.to_period('M')).n
pd.Series(START-pd.DateOffset(months=1)*np.arange(N 1))
output:
0 2022-07-25 01:23:45
1 2022-06-25 01:23:45
2 2022-05-25 01:23:45
3 2022-04-25 01:23:45
4 2022-03-25 01:23:45
5 2022-02-25 01:23:45
6 2022-01-25 01:23:45
dtype: datetime64[ns]
CodePudding user response:
During creation of datetime.timedelta
there are following arguments and default values thereof
(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
As you might observe there is not months
. Instances of datetime.datetime
have .replace
method which you might find useful, consider following example
import datetime
d0 = datetime.datetime(2022,3,28,20,8,57)
d1 = d0.replace(month=2)
d2 = d0.replace(month=1)
print(d0) # 2022-03-28 20:08:57
print(d1) # 2022-02-28 20:08:57
print(d2) # 2022-01-28 20:08:57
Beware that not all months have equal number of days.