How to iterate through a dataframe, to create an additional colum?-CodePudding

I have a dataframe with account information:

Date | Transaction type | Amount

I've sorted the data into deposit and withdrawals like:

deposit = accountHistory.loc[(accountHistory['Transaction type']=="Cash In")]
withdrawal = accountHistory.loc[(accountHistory['Transaction type']=="Withdrawal")]

So, that returns all of the locations of deposits and withdrawals.

The current account balance seems easy to get:

sum(deposit["Amount"])-sum(withdrawal["Amount"])

Now, I want to graph the account balance over time.

Should I iterate through the dataframe and create a new col called Account balance or is there some better way to do it using pd functions I'm not aware of?

For each row, account balance should show the deposit - withdrawal (to that date).

To add some more information: I have date like:

Date | Transaction type | Amount 
X/Y/Z     In                50
X/Y/Z     Interest          1
X/Y/Z     Withdrawal        20
X/Y/Z     Investment        30

And I want:

Date | Transaction type | Amount | Balance
X/Y/Z     In                50       50
X/Y/Z     Fees              1        49
X/Y/Z     Withdrawal        20       29
X/Y/Z     Investment        10       19
X/Y/Z     In                20       39
X/Y/Z     In                10       49
X/Y/Z     Withdrawal        10       39

Is there any other way to do this than using a for loop, iterating over the entire df?

CodePudding user response：

I would use pivot_table() and then create the Balance column as the difference between Cash in (as C) and Withdraw (W). Alternatively, you can use assign() to make it one line:

Let's suppose this is our dataframe:

     Date Transaction type  Amount
0  2021-01-01                C     100
1  2021-01-01                W      80
2  2021-02-02                C     120
3  2021-02-02                W      40
4  2021-02-03                C     200
5  2021-02-03                W     150

Using the proposed answer:

df = df.pivot_table(index='Date',columns='Transaction type',values='Amount',fillna=0)
df['Balance'] = df['C'] - df['W']

Outputs:

Transaction type    C    W  Balance
Date                               
2021-01-01        100   80       20
2021-02-02        120   40       80
2021-02-03        200  150       50

If you want the cummulative value then simply add .cumsum(). So:

df['Balance'] = (df['C'] - df['W']).cumsum()

CodePudding user response：

I noticed that Transaction type == "In" is the only case when the balance increases. All other transaction types decrease the balance.

So the first step is to check the Transaction type. "In" means increase (the value should stay positive (as is)), otherwise it should actually be negative.

Then the only step to do is to compute the cumulative sum.

So the whole code to compute the new column can be:

df['Balance'] = df.Amount.where(df['Transaction type'] == 'In',
   -df.Amount).cumsum()