Home > Blockchain >  How can I count the number of days in reverse from a specific point via python pandas?
How can I count the number of days in reverse from a specific point via python pandas?

Time:09-21

I'm trying to calculate the negative difference and count the number of days between the “date” column for each id. The count is reset for each id, when the column value is "Y".

However, I want the count to start from the reset date.

Currently I am doing this and getting the following output:

df = pd.DataFrame({'reset':['N','Y','N','N','N','Y','N'],
                   'date':['2019-09-04','2020-12-06','2020-12-06','2019-09-07','2019-11-08','2021-05-21','2021-06-23'],
                   'id':[16860,16860,16860,16860,16860,16860,16860]
                   })


df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
df = df.sort_values(['id','date'])

#create extra grouping column based on reset day
df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.groupby('id')['group'].cumsum()

df = df.sort_values(by='date', ascending=False)
df['negative difference'] =  df.groupby(['id','group'])['date'].diff() / np.timedelta64(1, 'D')
df['negative difference'] = df.groupby(['id','group'])['negative difference'].cumsum().fillna(0)
df = df.sort_values(['id','date'])


print(df)

  reset       date     id  group  negative difference
0     N 2019-09-04  16860    0.0                -65.0
3     N 2019-09-07  16860    0.0                -62.0
4     N 2019-11-08  16860    0.0                  0.0
1     Y 2020-12-06  16860    1.0                  0.0
2     N 2020-12-06  16860    1.0                  0.0
5     Y 2021-05-21  16860    2.0                -33.0
6     N 2021-06-23  16860    2.0                  0.0

However, I want the output to look something like this:

  reset       date     id  group  negative difference
0     N 2019-09-04  16860    0.0               -459.0
3     N 2019-09-07  16860    0.0               -456.0
4     N 2019-11-08  16860    0.0               -394.0
1     Y 2020-12-06  16860    1.0                  0.0
2     N 2020-12-06  16860    1.0               -166.0
5     Y 2021-05-21  16860    2.0                  0.0
6     N 2021-06-23  16860    2.0                  0.0

CodePudding user response:

You are on the right path , just need to get the reversed order get the group

df['group'] = df['reset'].replace({'N':False,'Y':True})
df['group'] = df.iloc[::-1,:].groupby('id')['group'].cumsum()
df['negative difference'] =  df.date.sub(df.groupby(['id','group'])['date'].transform('last')) / np.timedelta64(1, 'D')
df
  reset       date     id  group  negative difference
0     N 2019-09-04  16860      2               -459.0
3     N 2019-09-07  16860      2               -456.0
4     N 2019-11-08  16860      2               -394.0
1     Y 2020-12-06  16860      2                  0.0
2     N 2020-12-06  16860      1               -166.0
5     Y 2021-05-21  16860      1                  0.0
6     N 2021-06-23  16860      0                  0.0
  • Related