For example I have DataFrame
df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'b': [2, 2, 4, 3, 1000, 2000, 1, 500, 3]})
I need to cut by outliers and get these intervals: 1-4, 5-6, 7, 8, 9.
Cutting with pd.cut and pd.qcut does not give these results
CodePudding user response:
You can group them by consecutive values depending on the above/below mask:
m = df['b'].gt(100)
df['group'] = m.ne(m.shift()).cumsum()
output:
a b group
0 1 2 1
1 2 2 1
2 3 4 1
3 4 3 1
4 5 1000 2
5 6 2000 2
6 7 1 3
7 8 500 4
8 9 3 5