Example of my CSV:
clock_now,competition,market_name,back,lay
2022/08/09,South African Premier Division,Over/Under 0.5 Goals,0.28985,-1.0
2022/08/12,South African Premier Division,Over/Under 0.5 Goals,-1.0,1.28
2022/09/07,South African Premier Division,Over/Under 0.5 Goals,0.37,-1.0
2022/09/07,South African Premier Division,Over/Under 0.5 Goals,0.20,-1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/11/20,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
The results that are recorded in the back
column and the lay
column are updated every midnight (00:00
).
So when I try to analyze the cumulative sum to know which rows are above zero throughout the day (the combinations
list of lists is like a looping because there are several types of columns combinations that I analyze, I just summarized to facilitate the example in the question):
combinations = [['market_name', 'competition']]
for cbnt in combinations:
df['invest'] = df.groupby(cbnt)['lay'].cumsum().gt(df['lay'])
The current result is this:
clock_now cumulativesum invest
0 2022/08/09 -1.00 False
1 2022/08/12 0.28 False
2 2022/09/07 -0.72 True
3 2022/09/07 -1.72 False
4 2022/10/15 -0.72 False
5 2022/10/15 0.28 False
6 2022/10/15 1.28 True
7 2022/11/20 2.28 True
But the expected result is this:
Until 2022/08/09
the sum was 0
then False
Until 2022/08/12
the sum was -1
then False
Until 2022/09/07
the sum was 0.28
then True
Until 2022/09/07
the sum was 0.28
then True
Until 2022/10/15
the sum was -1.72
then False
Until 2022/10/15
the sum was -1.72
then False
Until 2022/10/15
the sum was -1.72
then False
Until 2022/11/20
the sum was 1.28
so True
How should I proceed to be able to add this cumulative sum according to the dates?
CodePudding user response:
You need to shift
your cumsum
per group:
combinations = [['market_name', 'competition']]
for cbnt in combinations:
df['invest'] = (df.groupby(cbnt)['lay']
.apply(lambda s: s.cumsum().shift(fill_value=0))
.gt(df['lay'])
)
print(df['invest'])
NB. the first value is True
as 0 > -1
, if you want False
, remove the fill_value=0
.
Output:
0 True
1 False
2 True
3 True
4 False
5 False
6 False
7 True
Name: invest, dtype: bool