I'm trying to calculate the negative difference and count the number of days between the “date” column for each id. The count is reset for each id, when the column value is "Y".
However, I want the count to start from the reset date.
Currently I am doing this and getting the following output:
df = pd.DataFrame({'reset':['N','Y','N','N','N','Y','N'],
'date':['2019-09-04','2020-12-06','2020-12-06','2019-09-07','2019-11-08','2021-05-21','2021-06-23'],
'id':[16860,16860,16860,16860,16860,16860,16860]
})
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
df = df.sort_values(['id','date'])
#create extra grouping column based on reset day
df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.groupby('id')['group'].cumsum()
df = df.sort_values(by='date', ascending=False)
df['negative difference'] = df.groupby(['id','group'])['date'].diff() / np.timedelta64(1, 'D')
df['negative difference'] = df.groupby(['id','group'])['negative difference'].cumsum().fillna(0)
df = df.sort_values(['id','date'])
print(df)
reset date id group negative difference
0 N 2019-09-04 16860 0.0 -65.0
3 N 2019-09-07 16860 0.0 -62.0
4 N 2019-11-08 16860 0.0 0.0
1 Y 2020-12-06 16860 1.0 0.0
2 N 2020-12-06 16860 1.0 0.0
5 Y 2021-05-21 16860 2.0 -33.0
6 N 2021-06-23 16860 2.0 0.0
However, I want the output to look something like this:
reset date id group negative difference
0 N 2019-09-04 16860 0.0 -459.0
3 N 2019-09-07 16860 0.0 -456.0
4 N 2019-11-08 16860 0.0 -394.0
1 Y 2020-12-06 16860 1.0 0.0
2 N 2020-12-06 16860 1.0 -166.0
5 Y 2021-05-21 16860 2.0 0.0
6 N 2021-06-23 16860 2.0 0.0
CodePudding user response:
You are on the right path , just need to get the reversed order get the group
df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.iloc[::-1,:].groupby('id')['group'].cumsum()
df['negative difference'] = df.date.sub(df.groupby(['id','group'])['date'].transform('last')) / np.timedelta64(1, 'D')
df
reset date id group negative difference
0 N 2019-09-04 16860 2 -459.0
3 N 2019-09-07 16860 2 -456.0
4 N 2019-11-08 16860 2 -394.0
1 Y 2020-12-06 16860 2 0.0
2 N 2020-12-06 16860 1 -166.0
5 Y 2021-05-21 16860 1 0.0
6 N 2021-06-23 16860 0 0.0