Home > OS >  How to do a cumulative sum in a DataFrame analyzing all possible combination columns instead of anal
How to do a cumulative sum in a DataFrame analyzing all possible combination columns instead of anal

Time:12-15

If we want to know if the cumulative sums are profitable in the ['Col 1','Col 2','Col 3'] columns for the long term, we do it this way:

import pandas as pd
import io

ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""

df = pd.read_csv(io.StringIO(ex_csv), sep=",")
df['invest'] = df.groupby(['Col 1','Col 2','Col 3'])['return'].cumsum().gt(df['return'])
true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())

But what if I want it to be TRUE for the cumulative as well when not only the combination of the 3 columns is positive, but if one or two are positive too?

Example:

Perhaps the value a of Col 1 the cumulative sum of it will be positive, but together with the values of Col 2 and Col 3 they will no longer be profitable, so in my current code it would appear as FALSE.

And I want it to be TRUE.

CodePudding user response:

If I understood the question correctly, this should solve your problem:

import pandas as pd
import io
import itertools

ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""
df = pd.read_csv(io.StringIO(ex_csv), sep=",")

cols = df.columns.values.tolist()
cols.remove('return')
cols_comb = []
for i in range(1, len(cols)   1):
    cols_comb  = (  a for a in itertools.combinations(cols, i) )
cols_comb  = [[i for i in a] for a in cols_comb]

for i,a in enumerate(cols_comb):
    df[f'invest_{i}'] = df.groupby(a)['return'].cumsum().gt(df['return']) 

df["invest"] = df.loc[:,(f'invest_{i}' for i in range(len(cols_comb)))].apply(lambda x: any(x), axis=1)

true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())

So, I'm creating a combination of columns in cols_comb, and it looks like this:

[['Col 1'],
 ['Col 2'],
 ['Col 3'],
 ['Col 1', 'Col 2'],
 ['Col 1', 'Col 3'],
 ['Col 2', 'Col 3'],
 ['Col 1', 'Col 2', 'Col 3']]

Combination code from: https://stackoverflow.com/a/73620523/7323333

Then, for each element in this list, I calculate the cumulative sum and assign the results to a new column in Dataframe. Then I check if any of the elements in rows of the newly generated columns is True and assign the result to the "invest" column.

  • Related