I have a fake bank dataset who look like this :
Card Number Date Amount
536518******2108 2015-05-01 00:01:54 55.0
536518******2191 2015-05-01 00:01:14 37.5
536518******2108 2015-05-01 00:02:34 105.2
536518******2191 2015-05-01 00:03:14 75.0
I want to calculate the elapsed time between 2 transactions
So my code look like this :
import numpy as np
import pandas as pd
import scipy.stats as st
df = pd.read_csv('df.csv')
position = df.columns.get_loc('Date')
df.groupby(df['Card Number'])
df['elapsed'] = df.iloc[1:, position] - df.iat[0, position]
df["elapsed"] = df['elapsed'].dt.components['seconds']
But this calculates the elapsed time between 2 'Date'
without caring about the "Card Number".
I don't know how I can reset the last time save when I switch to another user.
I guess that I need to modify the value in my iloc[]
and iat[]
but I can't figure out how to do that.
My output should look like this :
Card Number Date Amount elapsed
536518******2108 2015-05-01 00:01:54 55.0 0 -> first transaction (...2108)
536518******2191 2015-05-01 00:01:14 37.5 0 -> first transaction (...2191)
536518******2108 2015-05-01 00:02:34 105.2 40 -> second transaction from ...2108
536518******2191 2015-05-01 00:03:14 75.0 120 -> second transaction from ...2191
I also have to calculate how many transactions a Card do in a 30 minutes gap of time. In SQL, it should look like this :
df["transactions_30"] = df.query("SELECT COUNT(*) FROM df OVER (PARTITION BY 'Card Number' ORDER BY date RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)")
This code return invalid syntax
, how can I avoid using SQL query to execute the following request ?
CodePudding user response:
You can convert "Date" to pandas datetime object; then groupby
"Card Number" and find diff
; use the .dt
accessor to get elapsed time in the seconds:
df['Date'] = pd.to_datetime(df['Date'])
df['elapsed'] = df.groupby('Card Number')['Date'].diff().dt.total_seconds().fillna(0)
Output:
Card Number Date Amount elapsed
0 536518******2108 2015-05-01 00:01:54 55.0 0.0
1 536518******2191 2015-05-01 00:01:14 37.5 0.0
2 536518******2108 2015-05-01 00:02:34 105.2 40.0
3 536518******2191 2015-05-01 00:03:14 75.0 120.0