I have a large dataframe with a timestamp index. I converted this index using .to_pydatetime()
. I am trying to iterate over this index in invervals of 3 minutes, however though the dataframe has over 2,000 rows, my iteration stops at 53. Code below:
# create Time column out of index for comparison purposes
historydf['Time']=historydf.index
starttime = historydf['Time'][0].to_pydatetime()
endtime = historydf['Time'][2261].to_pydatetime()
example_list=[]
increment = 0
for i in historydf.index:
if i <= endtime:
if historydf['Time'][i] == starttime timedelta(minutes = increment):
examplelist.append(i)
increment = increment 3
however this code stops at a mere 53 values. Obviously this is less than 2260 /3 (~750). Worked on this for hours now, cant get it. Any help appreciated!
Below is a snippet of the dataframe I am using (for copy/paste purposes if needed). Pls keep in mind that the real dataframe is much longer.
Datetime
2022-08-04 09:30:00-04:00 90.949997
2022-08-04 09:32:00-04:00 90.790001
2022-08-04 09:33:00-04:00 90.730003
2022-08-04 09:34:00-04:00 90.839996
2022-08-04 09:35:00-04:00 90.775002
2022-08-04 09:36:00-04:00 90.769997
2022-08-04 09:37:00-04:00 90.775002
2022-08-04 09:38:00-04:00 90.610001
2022-08-04 09:39:00-04:00 90.860001
2022-08-04 09:40:00-04:00 90.900002
2022-08-04 09:41:00-04:00 91.074997
2022-08-04 09:42:00-04:00 91.120003
2022-08-04 09:43:00-04:00 91.139999
2022-08-04 09:44:00-04:00 91.099998
2022-08-04 09:45:00-04:00 91.205002
2022-08-04 09:46:00-04:00 91.120003
2022-08-04 09:47:00-04:00 91.199997
2022-08-04 09:48:00-04:00 91.114998
2022-08-04 09:49:00-04:00 91.114998
2022-08-04 09:50:00-04:00 91.074997
2022-08-04 09:51:00-04:00 90.970100
2022-08-04 09:52:00-04:00 90.949997
2022-08-04 09:53:00-04:00 91.110001
2022-08-04 09:54:00-04:00 91.224998
2022-08-04 09:55:00-04:00 91.250000
2022-08-04 09:56:00-04:00 91.190002
2022-08-04 09:57:00-04:00 91.074997
2022-08-04 09:58:00-04:00 91.089996
2022-08-04 09:59:00-04:00 91.184998
2022-08-04 10:00:00-04:00 91.070000
2022-08-04 10:01:00-04:00 91.070000
2022-08-04 10:02:00-04:00 91.010002
2022-08-04 10:03:00-04:00 91.010002
2022-08-04 10:04:00-04:00 91.004997
2022-08-04 10:05:00-04:00 91.010002
2022-08-04 10:06:00-04:00 91.139999
2022-08-04 10:07:00-04:00 91.209999
2022-08-04 10:08:00-04:00 91.239998
2022-08-04 10:09:00-04:00 91.209999
2022-08-04 10:11:00-04:00 91.250000
2022-08-04 10:12:00-04:00 91.309998
2022-08-04 10:14:00-04:00 91.279999
2022-08-04 10:15:00-04:00 91.300003
2022-08-04 10:16:00-04:00 91.235001
2022-08-04 10:17:00-04:00 91.320000
2022-08-04 10:18:00-04:00 91.224998
2022-08-04 10:20:00-04:00 91.235001
2022-08-04 10:21:00-04:00 91.214996
2022-08-04 10:22:00-04:00 91.209999
2022-08-04 10:23:00-04:00 91.129997
2022-08-04 10:24:00-04:00 91.139999
2022-08-04 10:25:00-04:00 91.160004
2022-08-04 10:26:00-04:00 91.175003
2022-08-04 10:27:00-04:00 91.154999
2022-08-04 10:28:00-04:00 91.220001
2022-08-04 10:29:00-04:00 91.339996
2022-08-04 10:30:00-04:00 91.239998
2022-08-04 10:31:00-04:00 91.264999
2022-08-04 10:32:00-04:00 91.290001
2022-08-04 10:33:00-04:00 91.239998
CodePudding user response:
For anyone wondering, I figured out a much simpler solution.
Instead of going through the trouble to increment and check the index versus a timedelta-inccremented datetime value, it is much simpler to simply check if the .minute
(or whatever you are trying to increment) is divisible by your required interval.
I accomplished this in my code with the following:
historydf['Time']=historydf.index
starttime = historydf['Time'][0].to_pydatetime()
endtime = historydf['Time'][2261].to_pydatetime()
example_list=[]
increment = 0
for i in historydf.index:
if i <= endtime:
if historydf['Time'][i].minute % 3 == 0:
example_list.append(i)
This works just fine. Thanks to everyone who responded!
CodePudding user response:
You can try using pandas resample. First convert the column with datetime using pd.to_datetime()
, then set it as the index and then use resample specifying the rule.
I have used the data you provided:
df = pd.read_clipboard(sep='\s{2,}', names=['time','value'])
df.time = pd.to_datetime(df.time)
df.set_index('time', drop=True, inplace=True)
df.resample('3T').first()
Out:
time value
2022-08-04 09:30:00-04:00 90.949997
2022-08-04 09:33:00-04:00 90.730003
2022-08-04 09:36:00-04:00 90.769997
2022-08-04 09:39:00-04:00 90.860001
2022-08-04 09:42:00-04:00 91.120003
2022-08-04 09:45:00-04:00 91.205002
2022-08-04 09:48:00-04:00 91.114998
2022-08-04 09:51:00-04:00 90.970100
2022-08-04 09:54:00-04:00 91.224998