Home > Net >  Group datetime values by datetime ranges and calculate the min and max values per range
Group datetime values by datetime ranges and calculate the min and max values per range

Time:01-13

I have the following DatetimeIndex values:

DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
               '2021-01-18 01:34:00', '2021-01-18 01:35:00',
               '2021-01-18 01:36:00', '2021-01-18 01:37:00',
               '2021-12-16 12:07:00', '2021-12-16 12:08:00',
               '2021-12-16 12:09:00', '2021-12-16 12:10:00'],
              dtype='datetime64[ns]', length=10, freq=None)

I need to group them by datetime ranges and calculate the min and max values per range.

This is the expected result:

range  range_min               range_max
1      2021-01-18 01:32:00     2021-01-18 01:37:00
2      2021-12-16 12:07:00     2021-12-16 12:10:00

How can I do it?

I can get min and max across the complete set of values of timestamps, but I don't know how to group timestamps into ranges.

import numpy as np
import pandas as pd

pd.DataFrame(my_timestamps,columns=["timestamp"]).agg({"timestamp" : [np.min, np.max]})

CodePudding user response:

You can use a custom groupby.agg using the date as grouper with DatetimeIndex.normalize:

idx = pd.DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
                        '2021-01-18 01:34:00', '2021-01-18 01:35:00',
                        '2021-01-18 01:36:00', '2021-01-18 01:37:00',
                        '2021-12-16 12:07:00', '2021-12-16 12:08:00',
                        '2021-12-16 12:09:00', '2021-12-16 12:10:00'],)


out = (idx.to_series().groupby(pd.factorize(idx.normalize())[0] 1)
          .agg(**{'range_min': 'min', 'range_max': 'max'})
          .rename_axis('range').reset_index()
      )

print(out)

Output:

   range           range_min           range_max
0      1 2021-01-18 01:32:00 2021-01-18 01:37:00
1      2 2021-12-16 12:07:00 2021-12-16 12:10:00

CodePudding user response:

Here's a way to do it without pandas:

from pprint import pprint

L = ['2021-01-18 01:32:00', '2021-01-18 01:33:00',
     '2021-01-18 01:34:00', '2021-01-18 01:35:00',
     '2021-01-18 01:36:00', '2021-01-18 01:37:00',
     '2021-12-16 12:07:00', '2021-12-16 12:08:00',
     '2021-12-16 12:09:00', '2021-12-16 12:10:00']

ranges = []

r_min = L[0]
d_min = r_min.split(' ', 1)[0]
r_max = None

for dt in L[1:]:
    date = dt.split(' ', 1)[0]
    if date != d_min:
        ranges.append((r_min, r_max))
        r_min = dt
        d_min = r_min.split(' ', 1)[0]
    r_max = dt
ranges.append((r_min, r_max))

pprint(ranges)

Output:

[('2021-01-18 01:32:00', '2021-01-18 01:37:00'),
 ('2021-12-16 12:07:00', '2021-12-16 12:10:00')]
  • Related