I have the following DatetimeIndex values:
DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
'2021-01-18 01:34:00', '2021-01-18 01:35:00',
'2021-01-18 01:36:00', '2021-01-18 01:37:00',
'2021-12-16 12:07:00', '2021-12-16 12:08:00',
'2021-12-16 12:09:00', '2021-12-16 12:10:00'],
dtype='datetime64[ns]', length=10, freq=None)
I need to group them by datetime ranges and calculate the min and max values per range.
This is the expected result:
range range_min range_max
1 2021-01-18 01:32:00 2021-01-18 01:37:00
2 2021-12-16 12:07:00 2021-12-16 12:10:00
How can I do it?
I can get min and max across the complete set of values of timestamps, but I don't know how to group timestamps into ranges.
import numpy as np
import pandas as pd
pd.DataFrame(my_timestamps,columns=["timestamp"]).agg({"timestamp" : [np.min, np.max]})
CodePudding user response:
You can use a custom groupby.agg
using the date as grouper with DatetimeIndex.normalize
:
idx = pd.DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
'2021-01-18 01:34:00', '2021-01-18 01:35:00',
'2021-01-18 01:36:00', '2021-01-18 01:37:00',
'2021-12-16 12:07:00', '2021-12-16 12:08:00',
'2021-12-16 12:09:00', '2021-12-16 12:10:00'],)
out = (idx.to_series().groupby(pd.factorize(idx.normalize())[0] 1)
.agg(**{'range_min': 'min', 'range_max': 'max'})
.rename_axis('range').reset_index()
)
print(out)
Output:
range range_min range_max
0 1 2021-01-18 01:32:00 2021-01-18 01:37:00
1 2 2021-12-16 12:07:00 2021-12-16 12:10:00
CodePudding user response:
Here's a way to do it without pandas
:
from pprint import pprint
L = ['2021-01-18 01:32:00', '2021-01-18 01:33:00',
'2021-01-18 01:34:00', '2021-01-18 01:35:00',
'2021-01-18 01:36:00', '2021-01-18 01:37:00',
'2021-12-16 12:07:00', '2021-12-16 12:08:00',
'2021-12-16 12:09:00', '2021-12-16 12:10:00']
ranges = []
r_min = L[0]
d_min = r_min.split(' ', 1)[0]
r_max = None
for dt in L[1:]:
date = dt.split(' ', 1)[0]
if date != d_min:
ranges.append((r_min, r_max))
r_min = dt
d_min = r_min.split(' ', 1)[0]
r_max = dt
ranges.append((r_min, r_max))
pprint(ranges)
Output:
[('2021-01-18 01:32:00', '2021-01-18 01:37:00'),
('2021-12-16 12:07:00', '2021-12-16 12:10:00')]