I would like to group rows in a pandas dataframe based on the difference between rows. Given the following dataframe
zz = pd.DataFrame([[1,0], [1.1, 2], [2,3], [2.19,4], [5,7], [6,0], [7,2], [8,3], [8.05, 0], [8.12,4]], columns = ['a', 'b'])
I would like to form groups when the difference between values in column 'a' is less than 0.2. So, the following groups (as a dataframegroup object) would emerge (indices of the rows in brackets) for this dataframe:
- group1: [0, 1]
- group2: [2,3]
- group3: [4]
- group5: [5]
- group6: [6]
- group7: [7, 8, 9]
I looked around but I could find an easy solution.
CodePudding user response:
Start a new group when the value is above (or equal) 0.2, use cumsum
to propagate the group:
zz.groupby(zz['a'].diff().ge(0.2).cumsum())
Intermediate:
zz['group'] = zz['a'].diff().ge(0.2).cumsum() #.add(1) # if you want to start with 1
a b group
0 1.00 0 0
1 1.10 2 0
2 2.00 3 1
3 2.19 4 1
4 5.00 7 2
5 6.00 0 3
6 7.00 2 4
7 8.00 3 5
8 8.05 0 5
9 8.12 4 5
CodePudding user response:
Use:
zz['groups'] = (~zz.a.diff().lt(0.2)).cumsum()
print (zz)
a b groups
0 1.00 0 1
1 1.10 2 1
2 2.00 3 2
3 2.19 4 2
4 5.00 7 3
5 6.00 0 4
6 7.00 2 5
7 8.00 3 6
8 8.05 0 6
9 8.12 4 6