Soooo... i have a dataframe shaped like this:
time | label |
---|---|
01:01 | A |
01:02 | A |
01:03 | A |
01:04 | C |
01:05 | C |
01:06 | A |
01:07 | A |
01:08 | A |
is there a way to separete it into an object like this?
{"A" : {min:01:01, max:01:03}, "C" : {min:01:04, max:01:05},"A" : {min:01:06, max:01:08}}
Grouping the local repetitions and getting it's min and max, for instance, the "A label repeats twice". I'd like to get the min and max of those intervals separately and transform in an object like the one above
what i've tryed so far is to use groupby on the label, but the end result is something like this:
{"A" : {min:01:01, max:01:08}, "C" : {min:01:04, max:01:05}}
which is different 'cause it does't keep the two intervals of "A" separately
CodePudding user response:
Using pandas groupby:
In [10]: df.groupby('label').agg({'time': [min, max]})
Out[10]:
time
min max
label
A 1:01 1:08
C 1:04 1:05
Updated answer using itertools.groupby
In [9]: from itertools import groupby
In [10]: res = []
In [11]: data = list(zip(df['time'].to_list(), df['label'].to_list()))
In [12]: for k, g in groupby(data, key=lambda x: x[1]):
...: group = list(g)
...: res.append([min(group, key=lambda x: x[0]), max(group, key=lambda x: x[0])])
In [13]: res
Out[13]:
[[('1:01', 'A'), ('1:03', 'A')],
[('1:04', 'C'), ('1:05', 'C')],
[('1:06', 'A'), ('1:08', 'A')]]
CodePudding user response:
This is one possible solution, assuming that the time column has the proper datetime type (if not, please share your DataFrame or the code you have used to generate it).
r = {}
for label, group in df.groupby('label'):
r[label] = {'min':group['time'].min(), 'max':group['time'].max()}