Count days by ID - Pandas-CodePudding

By having the following table, how can I count the days by ID? without use of for or any loop because it's large size data.

ID  Date 
a   01/01/2020
a   05/01/2020
a   08/01/2020
a   10/01/2020
b   05/05/2020
b   08/05/2020
b   12/05/2020
c   08/08/2020
c   22/08/2020

to have this result

ID  Date        Days Evolved Since Inicial date
a   01/01/2020  1
a   05/01/2020  4
a   08/01/2020  7
a   10/01/2020  9
b   05/05/2020  1
b   08/05/2020  3
b   12/05/2020  7
c   08/08/2020  1
c   22/08/2020  14

CodePudding user response：

You could do something like (df your dataframe):

def days_evolved(sdf):
    sdf["Days_evolved"] = sdf.Date - sdf.Date.iat[0]
    sdf["Days_evolved"].iat[0] = pd.Timedelta(days=1)
    return sdf

df = df.groupby("ID", as_index=False, sort=False).apply(days_evolved)

Result for the sample:

  ID       Date Days_evolved
0  a 2020-01-01       1 days
1  a 2020-01-05       4 days
2  a 2020-01-08       7 days
3  a 2020-01-10       9 days
4  b 2020-05-05       1 days
5  b 2020-05-08       3 days
6  b 2020-05-12       7 days
7  c 2020-08-08       1 days
8  c 2020-08-22      14 days

If you want int instead of pd.Timedelta then do

df["Days_evolved"] = df["Days_evolved"].dt.days

at the end.

CodePudding user response：

Use GroupBy.transform with GroupBy.first for first values to new column, so possible subtract. Then if not duplicated datetimes is possible replace 0:

df['new']=df['Date'].sub(df.groupby("ID")['Date'].transform('first')).dt.days.replace(0, 1)

print (df)
  ID       Date  new
0  a 2020-01-01    1
1  a 2020-01-05    4
2  a 2020-01-08    7
3  a 2020-01-10    9
4  b 2020-05-05    1
5  b 2020-05-08    3
6  b 2020-05-12    7
7  c 2020-08-08    1
8  c 2020-08-22   14

Or set 1 for first value of group by Series.where and Series.duplicated:

df['new'] = (df['Date'].sub(df.groupby("ID")['Date'].transform('first'))
                       .dt.days.where(df['ID'].duplicated(), 1))

print (df)
  ID       Date  new
0  a 2020-01-01    1
1  a 2020-01-05    4
2  a 2020-01-08    7
3  a 2020-01-10    9
4  b 2020-05-05    1
5  b 2020-05-08    3
6  b 2020-05-12    7
7  c 2020-08-08    1
8  c 2020-08-22   14