If we want to know if the cumulative sums are profitable in the ['Col 1','Col 2','Col 3']
columns for the long term, we do it this way:
import pandas as pd
import io
ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""
df = pd.read_csv(io.StringIO(ex_csv), sep=",")
df['invest'] = df.groupby(['Col 1','Col 2','Col 3'])['return'].cumsum().gt(df['return'])
true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())
But what if I want it to be TRUE
for the cumulative as well when not only the combination of the 3 columns is positive, but if one or two are positive too?
Example:
Perhaps the value a
of Col 1
the cumulative sum of it will be positive, but together with the values of Col 2
and Col 3
they will no longer be profitable, so in my current code it would appear as FALSE.
And I want it to be TRUE.
CodePudding user response:
If I understood the question correctly, this should solve your problem:
import pandas as pd
import io
import itertools
ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""
df = pd.read_csv(io.StringIO(ex_csv), sep=",")
cols = df.columns.values.tolist()
cols.remove('return')
cols_comb = []
for i in range(1, len(cols) 1):
cols_comb = ( a for a in itertools.combinations(cols, i) )
cols_comb = [[i for i in a] for a in cols_comb]
for i,a in enumerate(cols_comb):
df[f'invest_{i}'] = df.groupby(a)['return'].cumsum().gt(df['return'])
df["invest"] = df.loc[:,(f'invest_{i}' for i in range(len(cols_comb)))].apply(lambda x: any(x), axis=1)
true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())
So, I'm creating a combination of columns in cols_comb, and it looks like this:
[['Col 1'],
['Col 2'],
['Col 3'],
['Col 1', 'Col 2'],
['Col 1', 'Col 3'],
['Col 2', 'Col 3'],
['Col 1', 'Col 2', 'Col 3']]
Combination code from: https://stackoverflow.com/a/73620523/7323333
Then, for each element in this list, I calculate the cumulative sum and assign the results to a new column in Dataframe. Then I check if any of the elements in rows of the newly generated columns is True and assign the result to the "invest" column.