I have the following similar dataframe. I want to create a new variable time since eligibility that calculates the day diffence since the asset becomes eligible (the dummy eligible turns from 0 to 1). I added the variable TSE by hand that I would like to create.
import pandas as pd
df1 = pd.DataFrame(columns=['asset', 'eligible', "date"])
df1['asset'] = ['003', '003', '003', '003', '003', '003', '002', '002', '002', '002', '002', '002', '002', '003', '003', '003']
df1['eligible'] = [0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
df1['date'] = ['1.1.2016', '2.1.2016', '3.1.2016', '4.1.2016', '5.1.2016', '6.1.2016', '1.1.2016', '2.1.2016', '3.1.2016',
'4.1.2016', '5.1.2016', '6.1.2016', '7.1.2016', '1.1.2016', '2.1.2016', '3.1.2016']
df1["TSE"] = [0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 2, 3, 1, 2,3 ]
asset eligible date TSE
0 003 0 1.1.2016 0
1 003 0 2.1.2016 0
2 003 1 3.1.2016 1
3 003 1 4.1.2016 2
4 003 1 5.1.2016 3
5 003 1 6.1.2016 4
6 002 0 1.1.2016 0
7 002 0 2.1.2016 0
8 002 0 3.1.2016 0
9 002 0 4.1.2016 0
10 002 1 5.1.2016 1
11 002 1 6.1.2016 2
12 002 1 7.1.2016 3
13 003 1 1.1.2016 1
14 003 1 2.1.2016 2
15 003 1 3.1.2016 3
How would you tackle the problem? I tried the following loop but it does not work.
for i in range(len(df1)-1):
for j in range(len(df1)-1):
row1, row2 = df1.iloc[i], df1.iloc[i j]
if row1["eligible"] - row2["eligible"] != 0:
df1.iloc[i] = df1['date'].diff()
CodePudding user response:
Use cumsum
with groupby.transform
to get the desired result:
grp = (df1.eligible.diff()<0).cumsum()
df1["TSE"] = df1.groupby([grp, 'asset'], as_index=False)['eligible'].transform('cumsum')