I have df with two cols: age and y_hat, the df is sorted by age.
Example:
age y_hat
1 0.2
1 11.5
2 11.5
3 11.5
3 8
3 8
3 0.2
I want to give each row a number based on the y_hat value change (like bins), y_hat values are not sorted and can repeat themselves.
Desired output:
age y_hat bin
1 0.2 1
1 11.5 2
2 11.5 2
3 11.5 2
3 8 3
3 8 3
3 0.2 4
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'age': [1,1,2,3,3,3,3], 'y_hat': [0.2, 11.5,11.5,11.5,8,8,0.2]})
df['bin'] = (
# compare with previous row if the y_hat has changed
df['y_hat'] != df['y_hat'].shift()
).cumsum() # cumulative sum
Which gives you:
age y_hat bin
0 1 0.2 1
1 1 11.5 2
2 2 11.5 2
3 3 11.5 2
4 3 8.0 3
5 3 8.0 3
6 3 0.2 4