Create a new column based on timestamp values to count the days by steps-CodePudding

I would like to add a column to my dataset which corresponds to the time stamp, and counts the day by steps. That is, for one year there should be 365 "steps" and I would like for all grouped payments for each account on day 1 to be labeled 1 in this column and all payments on day 2 are then labeled 2 and so on up to day 365. I would like it to look something like this:

  account        time  steps
0       A  2022.01.01    1 
1       A  2022.01.02    2
2       A  2022.01.02    2
3       B  2022.01.01    1
4       B  2022.01.03    3
5       B  2022.01.05    5

I have tried this:

def day_step(x):
x['steps'] = x.time.dt.day.shift()
return x


df = df.groupby('account').apply(day_step)

however, it only counts for each month, once a new month begins it starts again from 1.

How can I fix this to make it provide the step count for the entire year?

CodePudding user response：

Use GroupBy.transform with first or min Series, subtract column time, convert timedeltas to days and add 1:

df['time'] = pd.to_datetime(df['time'])

df['steps1'] = (df['time'].sub(df.groupby('account')['time'].transform('first'))
                          .dt.days
                          .add(1)
print (df)
  account       time  steps  steps1
0       A 2022-01-01      1       1
1       A 2022-01-02      2       2
2       A 2022-01-02      2       2
3       B 2022-01-01      1       1
4       B 2022-01-03      3       3
5       B 2022-01-05      5       5

First idea, working only if first row is January 1:

df['steps'] = df['time'].dt.dayofyear