Min and max of labels repeating over a Pandas dataframe-CodePudding

Soooo... i have a dataframe shaped like this:

time	label
01:01	A
01:02	A
01:03	A
01:04	C
01:05	C
01:06	A
01:07	A
01:08	A

is there a way to separete it into an object like this?

{"A" : {min:01:01, max:01:03}, "C" : {min:01:04, max:01:05},"A" : {min:01:06, max:01:08}}

Grouping the local repetitions and getting it's min and max, for instance, the "A label repeats twice". I'd like to get the min and max of those intervals separately and transform in an object like the one above

what i've tryed so far is to use groupby on the label, but the end result is something like this:

{"A" : {min:01:01, max:01:08}, "C" : {min:01:04, max:01:05}}

which is different 'cause it does't keep the two intervals of "A" separately

CodePudding user response：

Using pandas groupby:

In [10]: df.groupby('label').agg({'time': [min, max]})
Out[10]:
       time
       min   max
label
A      1:01  1:08
C      1:04  1:05

Updated answer using itertools.groupby

In [9]: from itertools import groupby
In [10]: res = [] 
In [11]: data = list(zip(df['time'].to_list(), df['label'].to_list()))
In [12]: for k, g in groupby(data, key=lambda x: x[1]):
    ...:     group = list(g)
    ...:     res.append([min(group, key=lambda x: x[0]), max(group, key=lambda x: x[0])])

In [13]: res
Out[13]:
[[('1:01', 'A'), ('1:03', 'A')],
[('1:04', 'C'), ('1:05', 'C')],
[('1:06', 'A'), ('1:08', 'A')]]

CodePudding user response：

This is one possible solution, assuming that the time column has the proper datetime type (if not, please share your DataFrame or the code you have used to generate it).

r = {}
for label, group in df.groupby('label'):
    r[label] = {'min':group['time'].min(), 'max':group['time'].max()}