Home > database >  How to get cumulative sum between two different dates under conditions
How to get cumulative sum between two different dates under conditions

Time:02-06

I would like to get a cumulative sum of tran_amt for each Cust ID within 24 hours of first transaction. Please see my below example for illustration.

Original Data

DateTime Tran_amt Cust_ID
1/1/2021 2:00:00 PM 1000 c103102
1/1/2021 3:00:00 PM 2000 c103102
1/2/2021 10:00:00 AM 2000 c103102
1/2/2021 11:00:00 AM 1000 c211203
1/2/2021 12:00:00 PM 1000 c103102
1/2/2021 5:00:00 PM 2000 c103102
1/3/2021 3:00:00 AM 1000 c211203

Expected Output Data

DateTime Tran_amt Cust_ID First Transaction DateTime Cumulative_amt Remark
1/1/2021 2:00:00 PM 1000 c103102 1/1/2021 2:00:00 PM 1000
1/1/2021 3:00:00 PM 2000 c103102 1/1/2021 2:00:00 PM 3000
1/2/2021 10:00:00 AM 2000 c103102 1/1/2021 2:00:00 PM 5000
1/2/2021 11:00:00 AM 1000 c211203 1/2/2021 1:00:00 PM 1000
1/2/2021 12:00:00 PM 1000 c103102 1/1/2021 2:00:00 PM 6000
1/2/2021 5:00:00 PM 2000 c103102 1/2/2021 5:00:00 PM 2000 The tran datetime is exceeding 24 hours of previous first transaction Datetime, and thus the cumulative_amt is reset
1/3/2021 3:00:00 AM 1000 c211203 1/2/2021 1:00:00 PM 2000

Hope someone can help me the above question. Thanks a lot.


CodePudding user response:

For example :

import pandas as pd

df = pd.DataFrame({'DateTime': ['1/1/2021 2:00:00 PM', '1/1/2021 3:00:00 PM', '1/2/2021 10:00:00 AM', '1/2/2021 11:00:00 AM', '1/2/2021 12:00:00 PM', '1/2/2021 5:00:00 PM', '1/3/2021 3:00:00 AM'], 
                   'Tran_amt': [1000, 2000, 2000, 1000, 1000, 2000, 1000], 
                   'Cust_ID': ['c103102', 'c103102', 'c103102', 'c211203', 'c103102', 'c103102', 'c211203']})

df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.sort_values(by=['Cust_ID', 'DateTime'])
df['First Transaction DateTime'] = df.groupby('Cust_ID')['DateTime'].transform('first')
df['time_diff'] = (df['DateTime'] - df['First Transaction DateTime']).dt.total_seconds() / 3600
df['Cumulative_amt'] = df.groupby('Cust_ID').apply(lambda x: x['Tran_amt'].cumsum().where(x['time_diff'] <= 24, 0)).reset_index(drop=True)
df['Remark'] = df['Cumulative_amt'].where(df['Cumulative_amt'] != 0, '')

CodePudding user response:

You need to use the datetime module and other things to make it work.

  • Related