Home > Net >  How to make a cumulative sum in blocks with results according to until the day before instead of eac
How to make a cumulative sum in blocks with results according to until the day before instead of eac

Time:01-05

Example of my CSV:

clock_now,competition,market_name,back,lay
2022/08/09,South African Premier Division,Over/Under 0.5 Goals,0.28985,-1.0
2022/08/12,South African Premier Division,Over/Under 0.5 Goals,-1.0,1.28
2022/09/07,South African Premier Division,Over/Under 0.5 Goals,0.37,-1.0
2022/09/07,South African Premier Division,Over/Under 0.5 Goals,0.20,-1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/10/15,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0
2022/11/20,South African Premier Division,Over/Under 0.5 Goals,0.20,1.0

The results that are recorded in the back column and the lay column are updated every midnight (00:00).

So when I try to analyze the cumulative sum to know which rows are above zero throughout the day (the combinations list of lists is like a looping because there are several types of columns combinations that I analyze, I just summarized to facilitate the example in the question):

combinations = [['market_name', 'competition']]
for cbnt in combinations:
    df['invest'] = df.groupby(cbnt)['lay'].cumsum().gt(df['lay'])

The current result is this:

    clock_now  cumulativesum  invest
0  2022/08/09          -1.00   False
1  2022/08/12           0.28   False
2  2022/09/07          -0.72    True
3  2022/09/07          -1.72   False
4  2022/10/15          -0.72   False
5  2022/10/15           0.28   False
6  2022/10/15           1.28    True
7  2022/11/20           2.28    True

But the expected result is this:

Until 2022/08/09 the sum was 0 then False
Until 2022/08/12 the sum was -1 then False
Until 2022/09/07 the sum was 0.28 then True
Until 2022/09/07 the sum was 0.28 then True
Until 2022/10/15 the sum was -1.72 then False
Until 2022/10/15 the sum was -1.72 then False
Until 2022/10/15 the sum was -1.72 then False
Until 2022/11/20 the sum was 1.28 so True

How should I proceed to be able to add this cumulative sum according to the dates?

CodePudding user response:

You need to shift your cumsum per group:

combinations = [['market_name', 'competition']]
for cbnt in combinations:
    df['invest'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift(fill_value=0))
                      .gt(df['lay'])
                   )
    print(df['invest'])

NB. the first value is True as 0 > -1, if you want False, remove the fill_value=0.

Output:

0     True
1    False
2     True
3     True
4    False
5    False
6    False
7     True
Name: invest, dtype: bool
  • Related