cumsum on multi index pandas dataframe-CodePudding

I have a multi index dataset given below:

arr = np.array([12, 12, 12, 72, 72, 72, 26, 26, 26, 22, 22, 22, 46, 46, 46, 32, 32, 32])
df = pd.DataFrame({'date': ['1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000',
                            '1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000'],
                   'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                            'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
                   'lags': ['31/12/1999', '30/12/1999', '29/12/1999',
                            '1/1/2000', '31/12/1999', '30/12/1999',
                            '2/1/2000', '1/1/2000', '31/12/1999',
                            '31/12/1999', '30/12/1999', '29/12/1999',
                            '1/1/2000', '31/12/1999', '30/12/1999',
                            '2/1/2000', '1/1/2000', '31/12/1999']})
df["target"] = arr
df.set_index(['date', 'type', 'lags'], inplace=True)

I am trying to use cumsum for each type and run the code:

for g_name, g_df in df.groupby("type"):
    df.loc[g_df.index, 'target'] = df.loc[g_df.index, 'target'].cumsum().astype('float64')

However, it applies cumsum for each row and outputted the result:

                          target
date     type lags
1/1/2000 A    31/12/1999    12.0
              30/12/1999    24.0
              29/12/1999    36.0
2/1/2000 A    1/1/2000     108.0
              31/12/1999   180.0
              30/12/1999   252.0
3/1/2000 A    2/1/2000     278.0
              1/1/2000     304.0
              31/12/1999   330.0
1/1/2000 B    31/12/1999    22.0
              30/12/1999    44.0
              29/12/1999    66.0
2/1/2000 B    1/1/2000     112.0
              31/12/1999   158.0
              30/12/1999   204.0
3/1/2000 B    2/1/2000     236.0
              1/1/2000     268.0
              31/12/1999   300.0

The expected result is:

                          target
date     type lags
1/1/2000 A    31/12/1999    12.0
              30/12/1999    12.0
              29/12/1999    12.0
2/1/2000 A    1/1/2000      84.0
              31/12/1999    84.0
              30/12/1999    84.0
3/1/2000 A    2/1/2000      110.0
              1/1/2000      110.0
              31/12/1999    110.0
1/1/2000 B    31/12/1999    22.0
              30/12/1999    22.0
              29/12/1999    22.0
2/1/2000 B    1/1/2000      68.0
              31/12/1999    68.0
              30/12/1999    68.0
3/1/2000 B    2/1/2000      100.0
              1/1/2000      100.0
              31/12/1999    100.0

How can I do get expected results and update the original data in a pythonic way?

CodePudding user response：

You can group by date and type and extract the first value then group by type to compute the cumulative sum. Finally reindex the output to broadcast all values:

>>> df.groupby(['date', 'type']).first().groupby('type').cumsum().reindex(df.index)
                          target
date     type lags              
1/1/2000 A    31/12/1999      12
              30/12/1999      12
              29/12/1999      12
2/1/2000 A    1/1/2000        84
              31/12/1999      84
              30/12/1999      84
3/1/2000 A    2/1/2000       110
              1/1/2000       110
              31/12/1999     110
1/1/2000 B    31/12/1999      22
              30/12/1999      22
              29/12/1999      22
2/1/2000 B    1/1/2000        68
              31/12/1999      68
              30/12/1999      68
3/1/2000 B    2/1/2000       100
              1/1/2000       100
              31/12/1999     100

CodePudding user response：

You could use groupby cumcount first to create groups; then use it in groupby cumsum:

out = df.groupby([df.groupby(level=[0,1]).cumcount(), df.index.get_level_values(1)]).cumsum()

Output:

                          target
date     type lags              
1/1/2000 A    31/12/1999      12
              30/12/1999      12
              29/12/1999      12
2/1/2000 A    1/1/2000        84
              31/12/1999      84
              30/12/1999      84
3/1/2000 A    2/1/2000       110
              1/1/2000       110
              31/12/1999     110
1/1/2000 B    31/12/1999      22
              30/12/1999      22
              29/12/1999      22
2/1/2000 B    1/1/2000        68
              31/12/1999      68
              30/12/1999      68
3/1/2000 B    2/1/2000       100
              1/1/2000       100
              31/12/1999     100