Iterate over custom date time index in pandas?-CodePudding

I have a large dataframe with a timestamp index. I converted this index using .to_pydatetime(). I am trying to iterate over this index in invervals of 3 minutes, however though the dataframe has over 2,000 rows, my iteration stops at 53. Code below:

# create Time column out of index for comparison purposes
historydf['Time']=historydf.index
starttime = historydf['Time'][0].to_pydatetime()
endtime = historydf['Time'][2261].to_pydatetime()

example_list=[]
increment = 0
for i in historydf.index:
     if i <= endtime:
          if historydf['Time'][i] == starttime   timedelta(minutes = increment):
                examplelist.append(i)
                increment = increment   3

however this code stops at a mere 53 values. Obviously this is less than 2260 /3 (~750). Worked on this for hours now, cant get it. Any help appreciated!

Below is a snippet of the dataframe I am using (for copy/paste purposes if needed). Pls keep in mind that the real dataframe is much longer.

Datetime
2022-08-04 09:30:00-04:00    90.949997
2022-08-04 09:32:00-04:00    90.790001
2022-08-04 09:33:00-04:00    90.730003
2022-08-04 09:34:00-04:00    90.839996
2022-08-04 09:35:00-04:00    90.775002
2022-08-04 09:36:00-04:00    90.769997
2022-08-04 09:37:00-04:00    90.775002
2022-08-04 09:38:00-04:00    90.610001
2022-08-04 09:39:00-04:00    90.860001
2022-08-04 09:40:00-04:00    90.900002
2022-08-04 09:41:00-04:00    91.074997
2022-08-04 09:42:00-04:00    91.120003
2022-08-04 09:43:00-04:00    91.139999
2022-08-04 09:44:00-04:00    91.099998
2022-08-04 09:45:00-04:00    91.205002
2022-08-04 09:46:00-04:00    91.120003
2022-08-04 09:47:00-04:00    91.199997
2022-08-04 09:48:00-04:00    91.114998
2022-08-04 09:49:00-04:00    91.114998
2022-08-04 09:50:00-04:00    91.074997
2022-08-04 09:51:00-04:00    90.970100
2022-08-04 09:52:00-04:00    90.949997
2022-08-04 09:53:00-04:00    91.110001
2022-08-04 09:54:00-04:00    91.224998
2022-08-04 09:55:00-04:00    91.250000
2022-08-04 09:56:00-04:00    91.190002
2022-08-04 09:57:00-04:00    91.074997
2022-08-04 09:58:00-04:00    91.089996
2022-08-04 09:59:00-04:00    91.184998
2022-08-04 10:00:00-04:00    91.070000
2022-08-04 10:01:00-04:00    91.070000
2022-08-04 10:02:00-04:00    91.010002
2022-08-04 10:03:00-04:00    91.010002
2022-08-04 10:04:00-04:00    91.004997
2022-08-04 10:05:00-04:00    91.010002
2022-08-04 10:06:00-04:00    91.139999
2022-08-04 10:07:00-04:00    91.209999
2022-08-04 10:08:00-04:00    91.239998
2022-08-04 10:09:00-04:00    91.209999
2022-08-04 10:11:00-04:00    91.250000
2022-08-04 10:12:00-04:00    91.309998
2022-08-04 10:14:00-04:00    91.279999
2022-08-04 10:15:00-04:00    91.300003
2022-08-04 10:16:00-04:00    91.235001
2022-08-04 10:17:00-04:00    91.320000
2022-08-04 10:18:00-04:00    91.224998
2022-08-04 10:20:00-04:00    91.235001
2022-08-04 10:21:00-04:00    91.214996
2022-08-04 10:22:00-04:00    91.209999
2022-08-04 10:23:00-04:00    91.129997
2022-08-04 10:24:00-04:00    91.139999
2022-08-04 10:25:00-04:00    91.160004
2022-08-04 10:26:00-04:00    91.175003
2022-08-04 10:27:00-04:00    91.154999
2022-08-04 10:28:00-04:00    91.220001
2022-08-04 10:29:00-04:00    91.339996
2022-08-04 10:30:00-04:00    91.239998
2022-08-04 10:31:00-04:00    91.264999
2022-08-04 10:32:00-04:00    91.290001
2022-08-04 10:33:00-04:00    91.239998

CodePudding user response：

For anyone wondering, I figured out a much simpler solution.

Instead of going through the trouble to increment and check the index versus a timedelta-inccremented datetime value, it is much simpler to simply check if the .minute (or whatever you are trying to increment) is divisible by your required interval.

I accomplished this in my code with the following:

historydf['Time']=historydf.index
starttime = historydf['Time'][0].to_pydatetime()
endtime = historydf['Time'][2261].to_pydatetime()

example_list=[]
increment = 0
for i in historydf.index:
     if i <= endtime:
         if historydf['Time'][i].minute % 3 == 0:
             example_list.append(i)

This works just fine. Thanks to everyone who responded!

CodePudding user response：

You can try using pandas resample. First convert the column with datetime using pd.to_datetime(), then set it as the index and then use resample specifying the rule.

I have used the data you provided:

df = pd.read_clipboard(sep='\s{2,}', names=['time','value'])
df.time = pd.to_datetime(df.time)
df.set_index('time', drop=True, inplace=True)

df.resample('3T').first()

Out:

time                       value        
2022-08-04 09:30:00-04:00  90.949997
2022-08-04 09:33:00-04:00  90.730003
2022-08-04 09:36:00-04:00  90.769997
2022-08-04 09:39:00-04:00  90.860001
2022-08-04 09:42:00-04:00  91.120003
2022-08-04 09:45:00-04:00  91.205002
2022-08-04 09:48:00-04:00  91.114998
2022-08-04 09:51:00-04:00  90.970100
2022-08-04 09:54:00-04:00  91.224998