Vectorizing "balance sheet"-like data-CodePudding

The problem is the following: how to vectorize situations where a value on the next line of a dataframe depends on a previous one? I want to avoid the for loop.

The row logic I need for the desired column is, given an "open balance" different from 0 or NaN in row[0]:

(1) row_t.open_balance = row_t-1.close_balance

(2) row_t.close_balance = row_t.open_balance row_t.additions - row_t.subtractions

Besides being slow, the infamous for loop leads me to all kinds of SettingWithCopyWarning situations.

Can anyone help here? I've tried "helper columns" to no avail because of equation #2 above.

Thanks in advance!

CodePudding user response：

First, had to create reproducible example so I can try to answer this question. Remember to do this next time you ask a question.

Here I simulate additions and subtractions.

import pandas as pd
import numpy as np

additions = np.random.randint(0, 100, 10)
subtractions = np.random.randint(0, 50, 10)
net_change = pd.DataFrame({'balance': list(additions - subtractions)})

net_change

    balance
0   44
1   13
2   70
3   -36
4   73
5   -6
6   54
7   30
8   39
9   80

Create DataFrame with starting balance of 0.

df = pd.DataFrame({'balance': [0]})
df

balance
0   0

Concatenate the DataFrame with net_change.

df = pd.concat([df, net_change], ignore_index=True)
df

    balance
0   0
1   44
2   13
3   70
4   -36
5   73
6   -6
7   54
8   30
9   39
10  80

Then use pandas.DataFrame.cumsum to get the cumulative close balance which is what I assume is what you're are to compute.

df.cumsum()

    balance
0   0
1   44
2   57
3   127
4   91
5   164
6   158
7   212
8   242
9   281
10  361

CodePudding user response：

When dealing with neighbor rows and vectorization, a good tool is shift.

Create a data frame with the mentioned column names and some values:

df = pd.DataFrame([[0, 10, 2], [0, 40, 5], [0, 70, 8]], columns=['open_balance', 'additions', 'subtractions'])

'open_balance' relies pretty heavy on additions and subtractions. We use shift() to compute it based on the previous day:

df['open_balance'] = (df['additions'] - df['subtractions']).shift(1)

We clean the data frame from Nans:

df.fillna(0, inplace=True)

Now it is easy to compute 'close_balance':

df['close_balance'] = df['open_balance']   df['additions'] - df['subtractions']