How to calculate cumulative sum (reversed) of a Python DataFrame within given groups?-CodePudding

I have a data frame (df_f) with many (n=19) columns that, if conceptually simplified looks something like this:

Basin	(n=17 columns)	Chi
13.0	...	4
13.0	...	8
13.0	...	2
21.0	...	4
21.0	...	6
38.0	...	1
38.0	...	7
38.0	...	2
38.0	...	4

The real data frame has around 70,000 rows and around 60 unique 'Basin' ID values (and of course other columns with numerical data, but we don't need them for this, I still want to keep them though for plotting).

What I would like is the following:

I want calculate the cumulative sum of the 'Chi' value, but reversed (so, going up), AND I want it to be reset for each Basin 'group'. So the values should be:

Basin	(n=17 columns)	Chi_cum
13.0	...	14
13.0	...	10
13.0	...	2
21.0	...	10
21.0	...	6
38.0	...	14
38.0	...	13
38.0	...	6
38.0	...	4

As you can see, in Chi_cum, we progressively add the Chi values going up, but we reset the sum for each 'Basin'.

I know how to do a reverse cumulative sum (df_f['Chi_cum'] = df_f.loc[::-1, 'Chi'].cumsum()[::-1]), but that is for the entire dataframe. Conversely, I know how to use '.groupby', but I can't seem to combine both methods of grouping and summing.

I tried doing something like:

df_f["Chi_cum"] = df_f.groupby(by=['Basin']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1]

(as inspired by Pandas Python Groupby Cummulative Sum Reverse), but it does not work!

Could someone please help me figure this out? Thanks!

CodePudding user response：

You can try with series groupby

df['new'] = df.loc[::-1, 'Chi'].groupby(df['Basin']).cumsum()
df
Out[858]: 
   Basin (n=17 columns)  Chi  new
0   13.0            ...    4   14
1   13.0            ...    8   10
2   13.0            ...    2    2
3   21.0            ...    4   10
4   21.0            ...    6    6
5   38.0            ...    1   14
6   38.0            ...    7   13
7   38.0            ...    2    6
8   38.0            ...    4    4

CodePudding user response：

Apparently you can't actually add ascending=True or reverse=True to cumsum (!?!?), so just reverse before and after cumsum for each group:

df['Chi'] = df.groupby('Basin')['Chi'].apply(lambda x: x[::-1].cumsum()[::-1])

Output:

>>> df
   Basin  Chi
0   13.0   14
1   13.0   10
2   13.0    2
3   21.0   10
4   21.0    6
5   38.0   14
6   38.0   13
7   38.0    6
8   38.0