Home > Back-end >  Pandas recursive cumulative sum?
Pandas recursive cumulative sum?

Time:04-16

I am having a bit of trouble getting some pandas code to work. My basic problem is given a set of transactions and a set of balances, I need to come up with "balancing transactions"; i.e. fake transactions (which will be tagged as such) that will make it so that the sum of transactions are equal to balances (ignore the fact that in most cases this isn't a good idea; it makes sense in the context I am working in, I promise!).

Sample data:

import pandas as pd
from io import StringIO
txn_data = StringIO(
    """Contract,Txndate,TxnAmount
"wer42134423",1/1/2014, 50
"wer42134423",1/2/2014, -10
"wer42134423",1/3/2014, 100
"wer42134423",1/4/2014, -50
"wer42134423",1/5/2014, -10
"wer42134423",1/6/2014, 20
"wer42134423",1/7/2014, 50
"wer42134423",1/8/2014, -70
"wer42134423",1/10/2014,    21
"wer42134423",1/11/2014,    -3
"""
)
txns=pd.read_csv(txn_data,parse_dates=["Txndate"])
txns.head()
balance_data = StringIO(
    """Contract,Baldate,Amount
"wer42134423",  1/1/2014,   50
"wer42134423",  1/4/2014,   100
"wer42134423",  1/9/2014,   96
"wer42134423",  1/11/2014,  105
"""
)
balances=pd.read_csv(balance_data,parse_dates=["Baldate"])

txns["CumulativeSumofTxns"]=txns.groupby("Contract")["TxnAmount"].cumsum()
balances_merged=pd.merge_asof(balances,txns,by="Contract",left_on=["Baldate"],right_on=["Txndate"])
balances_merged.head()

I can do this fairly easily in Excel; I merge the cumulative sum of transactions onto my balance data, then just apply a fairly simple sum formula, and then everything can balance out. Excel Formula

However, I cannot for the life of me figure out how to do the same in Pandas (without manually iterating through each "cell", which would be horrendous for performance). After doing a lot of digging it almost seem like the expanding window function would do the trick, but I couldn't get that to work after multiple attempts with shifting and such. I think the problem is that every entry in my new column is dependent on entries for the same row (namely, the current balance and cumulative sum of transactions) and all the prior entries in the column (namely, all the prior balancing transactions). Any help appreciated!

CodePudding user response:

IIUC, do you want?

balances_merged['Cumulative Sum Balancing Transactions'] = balances_merged['Amount'] - balances_merged['CumulativeSumofTxns']
balances_merged['Balancing Transaction'] = balances_merged['Cumulative Sum Balancing Transactions'].diff()
balances_merged

Output:

      Contract    Baldate  Amount    Txndate  TxnAmount  CumulativeSumofTxns  Cumulative Sum Balancing Transactions  Balancing Transaction
0  wer42134423 2014-01-01      50 2014-01-01         50                   50                                      0                    NaN
1  wer42134423 2014-01-04     100 2014-01-04        -50                   90                                     10                   10.0
2  wer42134423 2014-01-09      96 2014-01-08        -70                   80                                     16                    6.0
3  wer42134423 2014-01-11     105 2014-01-11         -3                   98                                      7                   -9.0
  • Related