I have a DataFrame with cash, inflows and outflows.
I need to create a feature survival
that is the maximum number of periods the cash is enough to pay the projection of outflows (excluding the inflows from the computation).
Let's take an example from the table below.
(Again, the inflows do not count in this exercise).
In t=1
, from the starting cash = 100, I can add the outflows: -20, -50, -10, -10 e still having a positive cash (100-20-50-10-10 = 10 > 0) while with the outflow in t=5 the cash would be negative. So, as long as I can "survive" 4 periods in t=1 the survival = 4
.
In t=2
the survival = 3
and so on.
As it is a big DataFrame, how can I do it efficiently with Pandas?
t | cash | outflow | inflow | survival |
---|---|---|---|---|
1 | 100 | -20 | 10 | 4 |
2 | 90 | -50 | 10 | 3 |
3 | 50 | -10 | 80 | 2 |
4 | 120 | -10 | 70 | ... |
5 | 40 | -50 | 60 | ... |
CodePudding user response:
I would do like this:
df['survival'] = [ (cash df.iloc[i:].outflow.cumsum()>0).sum() for i,cash in enumerate(df.cash)]
Output:
t cash outflow survival
0 1 100 -20 4
1 2 90 -50 3
2 3 50 -10 2
3 4 120 -10 2
4 5 40 -50 0
Explanation: I make a loop on cash
values keeping also track of the row number using enumerate
. I use the row number to select only the portion of the dataframe from the current value of cash
going down. On this portion of the dataframe I make a cumulative sum and then I add it to the cash. This yields a series which is negative when the cash is smaller than the sum of the previous outflows. I then set it >0
so I have True
values when it's positive and False
when it's negative. I finally sum the whole series, each True
counts as a 1
, so the output is the survival number you are looking for. Hope it makes sense.
CodePudding user response:
With your sample data :
df = pd.DataFrame({
't': [1, 2, 3, 4, 5],
'cash': [100, 90, 50, 120, 40],
'outflow': [-20, -50, -10, -10, -50]
})
I choose to use the pandas apply()
function on this function with x
being the evaluated row and df
the complete DataFrame :
def survival(x, df):
cash = x['cash']
i = 0
while cash > 0:
try:
cash = cash df.loc[x.name i]['outflow']
i = 1
except KeyError:
print('End of dataframe')
i = 1
cash = -1 # To make sure we leave the loop
return i - 1
Then apply it to every row :
df['survival'] = df.apply(survival, args=(df,), axis=1)
# Output
t cash outflow survival
0 1 100 -20 4
1 2 90 -50 3
2 3 50 -10 2
3 4 120 -10 2
4 5 40 -50 0
CodePudding user response:
Creating the test dataframe
import pandas as pd
import numpy as np
N = 50
cash = 50 # the initial cash
# I will not type your dataframe
df = pd.DataFrame({'inflow': np.random.randint(1,10, N),
'outflow': np.random.randint(1, 20, N)})
Then the solution could be achieved with
# computes the cash for each period
ccash = (cash (df['inflow'] - df['outflow']).cumsum())
survival = (ccash[::-1] >= 0).cumsum()[::-1]