Binning sorted col based on other col in python-CodePudding

I have df with two cols: age and y_hat, the df is sorted by age.

Example:

age   y_hat
 1      0.2
 1      11.5
 2      11.5
 3      11.5
 3      8
 3      8
 3      0.2

I want to give each row a number based on the y_hat value change (like bins), y_hat values are not sorted and can repeat themselves.

Desired output:

age   y_hat     bin
 1      0.2       1
 1      11.5      2
 2      11.5      2
 3      11.5      2
 3      8         3
 3      8         3
 3      0.2       4

CodePudding user response：

import pandas as pd
df = pd.DataFrame({'age': [1,1,2,3,3,3,3], 'y_hat': [0.2, 11.5,11.5,11.5,8,8,0.2]})

df['bin'] = (
        # compare with previous row if the y_hat has changed
        df['y_hat'] != df['y_hat'].shift()
    ).cumsum() # cumulative sum

Which gives you:


    age y_hat   bin
0   1   0.2     1
1   1   11.5    2
2   2   11.5    2
3   3   11.5    2
4   3   8.0     3
5   3   8.0     3
6   3   0.2     4