I am trying to perform operations on the time column for each unique merchant (calculate time between transactions). How do I access the individual merchants in an iteration? is there a way to do that in python?
Thank you.
CodePudding user response:
Assuming, time
is already a datetime64
. Use groupby_diff
:
df['delta'] = df.groupby('merchant')['time'].diff()
print(df)
# Output
merchant time delta
0 A 2022-01-01 16:00:00 NaT
1 A 2022-01-01 16:30:00 0 days 00:30:00
2 A 2022-01-01 17:00:00 0 days 00:30:00
3 B 2022-01-01 10:00:00 NaT
4 B 2022-01-01 11:00:00 0 days 01:00:00
5 B 2022-01-01 12:00:00 0 days 01:00:00
If you want to compute the mean between transactions per merchant, use:
out = df.groupby('merchant', as_index=False)['time'].apply(lambda x: x.diff().mean())
print(out)
# Output
merchant time
0 A 0 days 00:30:00
1 B 0 days 01:00:00
Setup:
data = {'merchant': ['A', 'A', 'A', 'B', 'B', 'B'],
'time': [pd.Timestamp('2022-01-01 16:00:00'),
pd.Timestamp('2022-01-01 16:30:00'),
pd.Timestamp('2022-01-01 17:00:00'),
pd.Timestamp('2022-01-01 10:00:00'),
pd.Timestamp('2022-01-01 11:00:00'),
pd.Timestamp('2022-01-01 12:00:00')]}
df = pd.DataFrame(data)