Home > OS >  group consective values from pandas daataframe
group consective values from pandas daataframe

Time:11-09

The dataset contains 3 columns . Time(in epoch),Counter, and counterDifference.

Need to find consecutive value groups(groups separately identified using 0's) from CounterDiffrence column only if values sequence is larger than 6. If consecative values group identified, then need to find the minimum and maximum values for Time and Counter then append it with group number(starting from 0).So new dataframe contains GroupNo, StartTime,EndTime,StartCounter,EndCounter.

As in picture First group only conatins 4 consecative values . So it cant be assigned as group.But Second group has 8 conseactive values so it cane be classified as group.

enter image description here

Data:

{'Time': {6412: 1635515680,
  6413: 1635515681,
  6414: 1635515681,
  6415: 1635515681,
  6416: 1635515682,
  6418: 1635515690,
  6419: 1635515700,
  6422: 1635515720,
  6423: 1635515724,
  6424: 1635515726,
  6425: 1635515726,
  6426: 1635515730,
  6427: 1635515740,
  6428: 1635515751,
  6429: 1635515756,
  6430: 1635515757,
  6431: 1635515757,
  6432: 1635515760,
  6435: 1635515774,
  6436: 1635515775,
  6437: 1635515775,
  6438: 1635515775,
  6439: 1635515779,
  6440: 1635515780,
  6441: 1635515780,
  6442: 1635515790,
  6443: 1635515794,
  6444: 1635515795,
  6445: 1635515795,
  6446: 1635515797,
  6447: 1635515801,
  6448: 1635515802,
  6449: 1635515807,
  6451: 1635515820,
  6454: 1635515840,
  6455: 1635515850,
  6456: 1635515860,
  6457: 1635515871,
  6458: 1635515881,
  6461: 1635515901,
  6462: 1635515911,
  6463: 1635515921,
  6464: 1635515930,
  6465: 1635515940,
  6468: 1635515960,
  6469: 1635515972,
  6470: 1635515980,
  6471: 1635515991,
  6472: 1635515997,
  6473: 1635515998},
 'Counter': {6412: 46219.0,
  6413: 46219.0,
  6414: 46219.0,
  6415: 46219.0,
  6416: 46219.0,
  6418: 46222.0,
  6419: 46226.0,
  6422: 46234.0,
  6423: 46236.0,
  6424: 46236.0,
  6425: 46236.0,
  6426: 46236.0,
  6427: 46236.0,
  6428: 46236.0,
  6429: 46236.0,
  6430: 46236.0,
  6431: 46236.0,
  6432: 46236.0,
  6435: 46236.0,
  6436: 46236.0,
  6437: 46236.0,
  6438: 46236.0,
  6439: 46236.0,
  6440: 46236.0,
  6441: 46236.0,
  6442: 46236.0,
  6443: 46236.0,
  6444: 46236.0,
  6445: 46236.0,
  6446: 46236.0,
  6447: 46236.0,
  6448: 46236.0,
  6449: 46236.0,
  6451: 46241.0,
  6454: 46249.0,
  6455: 46253.0,
  6456: 46257.0,
  6457: 46261.0,
  6458: 46265.0,
  6461: 46273.0,
  6462: 46277.0,
  6463: 46281.0,
  6464: 46285.0,
  6465: 46289.0,
  6468: 46297.0,
  6469: 46301.0,
  6470: 46305.0,
  6471: 46309.0,
  6472: 46311.0,
  6473: 46311.0},
 'CounterDifference': {6412: 0.0,
  6413: 0.0,
  6414: 0.0,
  6415: 0.0,
  6416: 0.0,
  6418: 2.0,
  6419: 4.0,
  6422: 4.0,
  6423: 2.0,
  6424: 0.0,
  6425: 0.0,
  6426: 0.0,
  6427: 0.0,
  6428: 0.0,
  6429: 0.0,
  6430: 0.0,
  6431: 0.0,
  6432: 0.0,
  6435: 0.0,
  6436: 0.0,
  6437: 0.0,
  6438: 0.0,
  6439: 0.0,
  6440: 0.0,
  6441: 0.0,
  6442: 0.0,
  6443: 0.0,
  6444: 0.0,
  6445: 0.0,
  6446: 0.0,
  6447: 0.0,
  6448: 0.0,
  6449: 0.0,
  6451: 4.0,
  6454: 4.0,
  6455: 4.0,
  6456: 4.0,
  6457: 4.0,
  6458: 4.0,
  6461: 4.0,
  6462: 4.0,
  6463: 4.0,
  6464: 4.0,
  6465: 4.0,
  6468: 4.0,
  6469: 4.0,
  6470: 4.0,
  6471: 4.0,
  6472: 2.0,
  6473: 0.0}}

CodePudding user response:

This can be done by using diff and cumsum to create appropriate groups and then using groupby with filter and agg. We begin by creating the groups:

df['group'] = df['CounterDifference'].eq(0).diff().cumsum()

This gives:

            Time  Counter  CounterDifference group
6412  1635515680  46219.0                0.0   NaN
6413  1635515681  46219.0                0.0   0.0
6414  1635515681  46219.0                0.0   0.0
6415  1635515681  46219.0                0.0   0.0
6416  1635515682  46219.0                0.0   0.0
6418  1635515690  46222.0                2.0   1.0
6419  1635515700  46226.0                4.0   1.0
6422  1635515720  46234.0                4.0   1.0
6423  1635515724  46236.0                2.0   1.0
6424  1635515726  46236.0                0.0   2.0
6425  1635515726  46236.0                0.0   2.0
...
6449  1635515807  46236.0                0.0   2.0
6451  1635515820  46241.0                4.0   3.0
6454  1635515840  46249.0                4.0   3.0
...
6471  1635515991  46309.0                4.0   3.0
6472  1635515997  46311.0                2.0   3.0
6473  1635515998  46311.0                0.0   4.0

Now, we can filter the groups and then aggregate:

# filter away groups that are too small or the ones that contains zeros.
df = df.groupby('group').filter(lambda x: x['CounterDifference'].iloc[0] != 0 and len(x) > 6)

# groupby and aggregate the wanted metrics from each group
df = df.groupby('group').agg({'Time': [min, max], 'CounterDifference': [min, max]})

# postprocessing
df = df.reset_index(drop=True).reset_index()
df.columns = ['GroupNo', 'StartTime', 'EndTime', 'StartCounter', 'EndCounter']

End result:

   GroupNo   StartTime     EndTime  StartCounter  EndCounter
0        0  1635515820  1635515997           2.0         4.0

CodePudding user response:

To assign group, groupby CounterDifference, find size and conditionally assign. In code below, I only assign group to values greater than 0 and which exists in consecutive groups of 4 and above.

  df['group']=np.where(df.groupby('CounterDifference')['Counter'].transform('size').gt(4)&df['CounterDifference'].ne(0),'group','not_valid_group')


           Time  Counter  CounterDifference            group
6412  1635515680  46219.0                0.0  not_valid_group
6413  1635515681  46219.0                0.0  not_valid_group
6414  1635515681  46219.0                0.0  not_valid_group
6415  1635515681  46219.0                0.0  not_valid_group
6416  1635515682  46219.0                0.0  not_valid_group
6418  1635515690  46222.0                2.0  not_valid_group
6419  1635515700  46226.0                4.0  not_valid_group
6422  1635515720  46234.0                4.0  not_valid_group
6423  1635515724  46236.0                2.0  not_valid_group
6424  1635515726  46236.0                0.0  not_valid_group
6425  1635515726  46236.0                0.0  not_valid_group
6426  1635515730  46236.0                0.0  not_valid_group
6427  1635515740  46236.0                0.0  not_valid_group
6428  1635515751  46236.0                0.0  not_valid_group
6429  1635515756  46236.0                0.0  not_valid_group
6430  1635515757  46236.0                0.0  not_valid_group
6431  1635515757  46236.0                0.0  not_valid_group
6432  1635515760  46236.0                0.0  not_valid_group
6435  1635515774  46236.0                0.0  not_valid_group
6436  1635515775  46236.0                0.0  not_valid_group
6437  1635515775  46236.0                0.0  not_valid_group
6438  1635515775  46236.0                0.0  not_valid_group
6439  1635515779  46236.0                0.0  not_valid_group
6440  1635515780  46236.0                0.0  not_valid_group
6441  1635515780  46236.0                0.0  not_valid_group
6442  1635515790  46236.0                0.0  not_valid_group
6443  1635515794  46236.0                0.0  not_valid_group
6444  1635515795  46236.0                0.0  not_valid_group
6445  1635515795  46236.0                0.0  not_valid_group
6446  1635515797  46236.0                0.0  not_valid_group
6447  1635515801  46236.0                0.0  not_valid_group
6448  1635515802  46236.0                0.0  not_valid_group
6449  1635515807  46236.0                0.0  not_valid_group
6451  1635515820  46241.0                4.0            group
6454  1635515840  46249.0                4.0            group
6455  1635515850  46253.0                4.0            group
6456  1635515860  46257.0                4.0            group
6457  1635515871  46261.0                4.0            group
6458  1635515881  46265.0                4.0            group
6461  1635515901  46273.0                4.0            group
6462  1635515911  46277.0                4.0            group
6463  1635515921  46281.0                4.0            group
6464  1635515930  46285.0                4.0            group
6465  1635515940  46289.0                4.0            group
6468  1635515960  46297.0                4.0            group
6469  1635515972  46301.0                4.0            group
6470  1635515980  46305.0                4.0            group
6471  1635515991  46309.0                4.0            group
6472  1635515997  46311.0                2.0  not_valid_group
6473  1635515998  46311.0                0.0  not_valid_group
  • Related