I have a dataframe df that looks like this:
ID | Months | Borrow_Rank |
---|---|---|
1 | 0 | 1 |
1 | 1 | 1 |
1 | 2 | 1 |
2 | 0 | 1 |
2 | 1 | 1 |
2 | 2 | 1 |
3 | 0 | 2 |
3 | 1 | 2 |
4 | 0 | 1 |
4 | 1 | 1 |
I want to create a new variable Months_Adjusted that starts counting from 0 for as long as Borrow_Rank remains the same.
ID | Months | Borrow_Rank | Months_Adjusted |
---|---|---|---|
1 | 0 | 1 | 0 |
1 | 1 | 1 | 1 |
1 | 2 | 1 | 2 |
2 | 0 | 1 | 3 |
2 | 1 | 1 | 4 |
2 | 2 | 1 | 5 |
3 | 0 | 2 | 0 |
3 | 1 | 2 | 1 |
4 | 0 | 1 | 0 |
4 | 1 | 1 | 1 |
Thank you all and I apologise if I could have written the question better. This is my first post.
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'Borrow_Rank':[1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1]})
selector = (df['Borrow_Rank'] != df['Borrow_Rank'].shift()).cumsum()
df['Months_Adjusted'] = df.groupby(selector).cumcount()
CodePudding user response:
Here is a solution using itertools.groupby
-
from itertools import groupby
df['Months_Adjusted'] = pd.concat(
[pd.Series(range(len(list(g))))
for k, g in groupby(df['Borrow_Rank'])],
ignore_index=True)
Output
ID Months Borrow_Rank Months_Adjusted
0 1 0 1 0
1 1 1 1 1
2 1 2 1 2
3 2 0 1 3
4 2 1 1 4
5 2 2 1 5
6 3 0 2 0
7 3 1 2 1
8 4 0 1 0
9 4 1 1 1