The dataset contains 3 columns . Time(in epoch),Counter, and counterDifference.
Need to find consecutive value groups(groups separately identified using 0's) from CounterDiffrence column only if values sequence is larger than 6. If consecative values group identified, then need to find the minimum and maximum values for Time and Counter then append it with group number(starting from 0).So new dataframe contains GroupNo, StartTime,EndTime,StartCounter,EndCounter.
As in picture First group only conatins 4 consecative values . So it cant be assigned as group.But Second group has 8 conseactive values so it cane be classified as group.
Data:
{'Time': {6412: 1635515680,
6413: 1635515681,
6414: 1635515681,
6415: 1635515681,
6416: 1635515682,
6418: 1635515690,
6419: 1635515700,
6422: 1635515720,
6423: 1635515724,
6424: 1635515726,
6425: 1635515726,
6426: 1635515730,
6427: 1635515740,
6428: 1635515751,
6429: 1635515756,
6430: 1635515757,
6431: 1635515757,
6432: 1635515760,
6435: 1635515774,
6436: 1635515775,
6437: 1635515775,
6438: 1635515775,
6439: 1635515779,
6440: 1635515780,
6441: 1635515780,
6442: 1635515790,
6443: 1635515794,
6444: 1635515795,
6445: 1635515795,
6446: 1635515797,
6447: 1635515801,
6448: 1635515802,
6449: 1635515807,
6451: 1635515820,
6454: 1635515840,
6455: 1635515850,
6456: 1635515860,
6457: 1635515871,
6458: 1635515881,
6461: 1635515901,
6462: 1635515911,
6463: 1635515921,
6464: 1635515930,
6465: 1635515940,
6468: 1635515960,
6469: 1635515972,
6470: 1635515980,
6471: 1635515991,
6472: 1635515997,
6473: 1635515998},
'Counter': {6412: 46219.0,
6413: 46219.0,
6414: 46219.0,
6415: 46219.0,
6416: 46219.0,
6418: 46222.0,
6419: 46226.0,
6422: 46234.0,
6423: 46236.0,
6424: 46236.0,
6425: 46236.0,
6426: 46236.0,
6427: 46236.0,
6428: 46236.0,
6429: 46236.0,
6430: 46236.0,
6431: 46236.0,
6432: 46236.0,
6435: 46236.0,
6436: 46236.0,
6437: 46236.0,
6438: 46236.0,
6439: 46236.0,
6440: 46236.0,
6441: 46236.0,
6442: 46236.0,
6443: 46236.0,
6444: 46236.0,
6445: 46236.0,
6446: 46236.0,
6447: 46236.0,
6448: 46236.0,
6449: 46236.0,
6451: 46241.0,
6454: 46249.0,
6455: 46253.0,
6456: 46257.0,
6457: 46261.0,
6458: 46265.0,
6461: 46273.0,
6462: 46277.0,
6463: 46281.0,
6464: 46285.0,
6465: 46289.0,
6468: 46297.0,
6469: 46301.0,
6470: 46305.0,
6471: 46309.0,
6472: 46311.0,
6473: 46311.0},
'CounterDifference': {6412: 0.0,
6413: 0.0,
6414: 0.0,
6415: 0.0,
6416: 0.0,
6418: 2.0,
6419: 4.0,
6422: 4.0,
6423: 2.0,
6424: 0.0,
6425: 0.0,
6426: 0.0,
6427: 0.0,
6428: 0.0,
6429: 0.0,
6430: 0.0,
6431: 0.0,
6432: 0.0,
6435: 0.0,
6436: 0.0,
6437: 0.0,
6438: 0.0,
6439: 0.0,
6440: 0.0,
6441: 0.0,
6442: 0.0,
6443: 0.0,
6444: 0.0,
6445: 0.0,
6446: 0.0,
6447: 0.0,
6448: 0.0,
6449: 0.0,
6451: 4.0,
6454: 4.0,
6455: 4.0,
6456: 4.0,
6457: 4.0,
6458: 4.0,
6461: 4.0,
6462: 4.0,
6463: 4.0,
6464: 4.0,
6465: 4.0,
6468: 4.0,
6469: 4.0,
6470: 4.0,
6471: 4.0,
6472: 2.0,
6473: 0.0}}
CodePudding user response:
This can be done by using diff
and cumsum
to create appropriate groups and then using groupby
with filter
and agg
. We begin by creating the groups:
df['group'] = df['CounterDifference'].eq(0).diff().cumsum()
This gives:
Time Counter CounterDifference group
6412 1635515680 46219.0 0.0 NaN
6413 1635515681 46219.0 0.0 0.0
6414 1635515681 46219.0 0.0 0.0
6415 1635515681 46219.0 0.0 0.0
6416 1635515682 46219.0 0.0 0.0
6418 1635515690 46222.0 2.0 1.0
6419 1635515700 46226.0 4.0 1.0
6422 1635515720 46234.0 4.0 1.0
6423 1635515724 46236.0 2.0 1.0
6424 1635515726 46236.0 0.0 2.0
6425 1635515726 46236.0 0.0 2.0
...
6449 1635515807 46236.0 0.0 2.0
6451 1635515820 46241.0 4.0 3.0
6454 1635515840 46249.0 4.0 3.0
...
6471 1635515991 46309.0 4.0 3.0
6472 1635515997 46311.0 2.0 3.0
6473 1635515998 46311.0 0.0 4.0
Now, we can filter the groups and then aggregate:
# filter away groups that are too small or the ones that contains zeros.
df = df.groupby('group').filter(lambda x: x['CounterDifference'].iloc[0] != 0 and len(x) > 6)
# groupby and aggregate the wanted metrics from each group
df = df.groupby('group').agg({'Time': [min, max], 'CounterDifference': [min, max]})
# postprocessing
df = df.reset_index(drop=True).reset_index()
df.columns = ['GroupNo', 'StartTime', 'EndTime', 'StartCounter', 'EndCounter']
End result:
GroupNo StartTime EndTime StartCounter EndCounter
0 0 1635515820 1635515997 2.0 4.0
CodePudding user response:
To assign group, groupby CounterDifference, find size and conditionally assign. In code below, I only assign group to values greater than 0 and which exists in consecutive groups of 4 and above.
df['group']=np.where(df.groupby('CounterDifference')['Counter'].transform('size').gt(4)&df['CounterDifference'].ne(0),'group','not_valid_group')
Time Counter CounterDifference group
6412 1635515680 46219.0 0.0 not_valid_group
6413 1635515681 46219.0 0.0 not_valid_group
6414 1635515681 46219.0 0.0 not_valid_group
6415 1635515681 46219.0 0.0 not_valid_group
6416 1635515682 46219.0 0.0 not_valid_group
6418 1635515690 46222.0 2.0 not_valid_group
6419 1635515700 46226.0 4.0 not_valid_group
6422 1635515720 46234.0 4.0 not_valid_group
6423 1635515724 46236.0 2.0 not_valid_group
6424 1635515726 46236.0 0.0 not_valid_group
6425 1635515726 46236.0 0.0 not_valid_group
6426 1635515730 46236.0 0.0 not_valid_group
6427 1635515740 46236.0 0.0 not_valid_group
6428 1635515751 46236.0 0.0 not_valid_group
6429 1635515756 46236.0 0.0 not_valid_group
6430 1635515757 46236.0 0.0 not_valid_group
6431 1635515757 46236.0 0.0 not_valid_group
6432 1635515760 46236.0 0.0 not_valid_group
6435 1635515774 46236.0 0.0 not_valid_group
6436 1635515775 46236.0 0.0 not_valid_group
6437 1635515775 46236.0 0.0 not_valid_group
6438 1635515775 46236.0 0.0 not_valid_group
6439 1635515779 46236.0 0.0 not_valid_group
6440 1635515780 46236.0 0.0 not_valid_group
6441 1635515780 46236.0 0.0 not_valid_group
6442 1635515790 46236.0 0.0 not_valid_group
6443 1635515794 46236.0 0.0 not_valid_group
6444 1635515795 46236.0 0.0 not_valid_group
6445 1635515795 46236.0 0.0 not_valid_group
6446 1635515797 46236.0 0.0 not_valid_group
6447 1635515801 46236.0 0.0 not_valid_group
6448 1635515802 46236.0 0.0 not_valid_group
6449 1635515807 46236.0 0.0 not_valid_group
6451 1635515820 46241.0 4.0 group
6454 1635515840 46249.0 4.0 group
6455 1635515850 46253.0 4.0 group
6456 1635515860 46257.0 4.0 group
6457 1635515871 46261.0 4.0 group
6458 1635515881 46265.0 4.0 group
6461 1635515901 46273.0 4.0 group
6462 1635515911 46277.0 4.0 group
6463 1635515921 46281.0 4.0 group
6464 1635515930 46285.0 4.0 group
6465 1635515940 46289.0 4.0 group
6468 1635515960 46297.0 4.0 group
6469 1635515972 46301.0 4.0 group
6470 1635515980 46305.0 4.0 group
6471 1635515991 46309.0 4.0 group
6472 1635515997 46311.0 2.0 not_valid_group
6473 1635515998 46311.0 0.0 not_valid_group