Home > Back-end >  pandas grouping on difference between rows
pandas grouping on difference between rows

Time:10-25

I would like to group rows in a pandas dataframe based on the difference between rows. Given the following dataframe

zz = pd.DataFrame([[1,0], [1.1, 2], [2,3], [2.19,4], [5,7], [6,0], [7,2], [8,3], [8.05, 0], [8.12,4]], columns = ['a', 'b'])

I would like to form groups when the difference between values in column 'a' is less than 0.2. So, the following groups (as a dataframegroup object) would emerge (indices of the rows in brackets) for this dataframe:

  • group1: [0, 1]
  • group2: [2,3]
  • group3: [4]
  • group5: [5]
  • group6: [6]
  • group7: [7, 8, 9]

I looked around but I could find an easy solution.

CodePudding user response:

Start a new group when the value is above (or equal) 0.2, use cumsum to propagate the group:

zz.groupby(zz['a'].diff().ge(0.2).cumsum())

Intermediate:

zz['group'] = zz['a'].diff().ge(0.2).cumsum() #.add(1)  # if you want to start with 1

      a  b  group
0  1.00  0      0
1  1.10  2      0
2  2.00  3      1
3  2.19  4      1
4  5.00  7      2
5  6.00  0      3
6  7.00  2      4
7  8.00  3      5
8  8.05  0      5
9  8.12  4      5

CodePudding user response:

Use:

zz['groups'] = (~zz.a.diff().lt(0.2)).cumsum()
print (zz)
      a  b  groups
0  1.00  0       1
1  1.10  2       1
2  2.00  3       2
3  2.19  4       2
4  5.00  7       3
5  6.00  0       4
6  7.00  2       5
7  8.00  3       6
8  8.05  0       6
9  8.12  4       6
  • Related