Home > Mobile >  Create a new column based on timestamp values to count the days by steps
Create a new column based on timestamp values to count the days by steps

Time:02-26

I would like to add a column to my dataset which corresponds to the time stamp, and counts the day by steps. That is, for one year there should be 365 "steps" and I would like for all grouped payments for each account on day 1 to be labeled 1 in this column and all payments on day 2 are then labeled 2 and so on up to day 365. I would like it to look something like this:

  account        time  steps
0       A  2022.01.01    1 
1       A  2022.01.02    2
2       A  2022.01.02    2
3       B  2022.01.01    1
4       B  2022.01.03    3
5       B  2022.01.05    5

I have tried this:

def day_step(x):
x['steps'] = x.time.dt.day.shift()
return x


df = df.groupby('account').apply(day_step)

however, it only counts for each month, once a new month begins it starts again from 1.

How can I fix this to make it provide the step count for the entire year?

CodePudding user response:

Use GroupBy.transform with first or min Series, subtract column time, convert timedeltas to days and add 1:

df['time'] = pd.to_datetime(df['time'])

df['steps1'] = (df['time'].sub(df.groupby('account')['time'].transform('first'))
                          .dt.days
                          .add(1)
print (df)
  account       time  steps  steps1
0       A 2022-01-01      1       1
1       A 2022-01-02      2       2
2       A 2022-01-02      2       2
3       B 2022-01-01      1       1
4       B 2022-01-03      3       3
5       B 2022-01-05      5       5

First idea, working only if first row is January 1:

df['steps'] = df['time'].dt.dayofyear
  • Related