I am following this article - Pandas output date, start and end time and event status given datetime continuity
An example of testing consecutive hours is in the post. I need to test in consecutive minutes. I modified the line of code from 3600 to 60 (hours to minutes)
#test consecutive minutes
df['g'] = df['Date'].diff().dt.total_seconds().div(60).ne(1)
The end result returns all True for any consecutive minutes.
Date meter g
2009-02-13 13:23:00 53.49 True
2009-02-13 13:24:00 64.91 True
2009-02-13 13:25:00 32.04 True
2009-02-13 13:26:00 45.94 True
2009-02-13 15:45:00 45.94 True
Where the result should be
Date meter g
2009-02-13 13:23:00 53.49 True
2009-02-13 13:24:00 64.91 False
2009-02-13 13:25:00 32.04 False
2009-02-13 13:26:00 45.94 False
2009-02-13 15:45:00 45.94 True
What is wrong here?
CodePudding user response:
The issue with your code is likely due to floating point approximation? This would be solved if you round the values:
pd.to_datetime(df['Date']).diff().dt.total_seconds().div(60).round().ne(1)
However, there is a much better way, use the Timedelta comparison properties:
df['g'] = pd.to_datetime(df['Date']).diff().ne('1min')
output:
Date meter g
0 2009-02-13 13:23:00 53.49 True
1 2009-02-13 13:24:00 64.91 False
2 2009-02-13 13:25:00 32.04 False
3 2009-02-13 13:26:00 45.94 False
4 2009-02-13 15:45:00 45.94 True
For your initial question (first and last of group):
forward = pd.to_datetime(df['Date']).diff().ne('1min')
reverse = (-pd.to_datetime(df['Date']).diff(-1)).ne('1min')
df['g'] = forward|reverse
output:
Date meter g
0 2009-02-13 13:23:00 53.49 True
1 2009-02-13 13:24:00 64.91 False
2 2009-02-13 13:25:00 32.04 False
3 2009-02-13 13:26:00 45.94 True