Home > Software engineering >  Pandas elapsed time between 2 date order by user
Pandas elapsed time between 2 date order by user

Time:02-17

I have a fake bank dataset who look like this :

Card Number                  Date                 Amount 
536518******2108      2015-05-01 00:01:54          55.0   
536518******2191      2015-05-01 00:01:14          37.5  
536518******2108      2015-05-01 00:02:34          105.2 
536518******2191      2015-05-01 00:03:14          75.0   

I want to calculate the elapsed time between 2 transactions

So my code look like this :

import numpy as np
import pandas as pd
import scipy.stats as st

df = pd.read_csv('df.csv')

position = df.columns.get_loc('Date')
df.groupby(df['Card Number'])
df['elapsed'] =  df.iloc[1:, position] - df.iat[0, position]
df["elapsed"] = df['elapsed'].dt.components['seconds']

But this calculates the elapsed time between 2 'Date' without caring about the "Card Number". I don't know how I can reset the last time save when I switch to another user. I guess that I need to modify the value in my iloc[] and iat[] but I can't figure out how to do that. My output should look like this :

Card Number                  Date                 Amount       elapsed
536518******2108      2015-05-01 00:01:54          55.0           0         -> first transaction (...2108)
536518******2191      2015-05-01 00:01:14          37.5           0         -> first transaction (...2191)
536518******2108      2015-05-01 00:02:34          105.2          40        -> second transaction from ...2108
536518******2191      2015-05-01 00:03:14          75.0           120       -> second transaction from ...2191

I also have to calculate how many transactions a Card do in a 30 minutes gap of time. In SQL, it should look like this :

df["transactions_30"] = df.query("SELECT COUNT(*) FROM df OVER (PARTITION BY 'Card Number' ORDER BY date RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)")

This code return invalid syntax, how can I avoid using SQL query to execute the following request ?

CodePudding user response:

You can convert "Date" to pandas datetime object; then groupby "Card Number" and find diff; use the .dt accessor to get elapsed time in the seconds:

df['Date'] = pd.to_datetime(df['Date'])
df['elapsed'] = df.groupby('Card Number')['Date'].diff().dt.total_seconds().fillna(0)

Output:

        Card Number                Date  Amount  elapsed
0  536518******2108 2015-05-01 00:01:54    55.0      0.0
1  536518******2191 2015-05-01 00:01:14    37.5      0.0
2  536518******2108 2015-05-01 00:02:34   105.2     40.0
3  536518******2191 2015-05-01 00:03:14    75.0    120.0
  • Related