Home > Net >  Cut continuous data by outliers
Cut continuous data by outliers

Time:07-07

For example I have DataFrame

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'b': [2, 2, 4, 3, 1000, 2000, 1, 500, 3]})

enter image description here

I need to cut by outliers and get these intervals: 1-4, 5-6, 7, 8, 9.

Cutting with pd.cut and pd.qcut does not give these results

CodePudding user response:

You can group them by consecutive values depending on the above/below mask:

m = df['b'].gt(100)
df['group'] = m.ne(m.shift()).cumsum()

output:

   a     b  group
0  1     2      1
1  2     2      1
2  3     4      1
3  4     3      1
4  5  1000      2
5  6  2000      2
6  7     1      3
7  8   500      4
8  9     3      5
  • Related