Home > Mobile >  cumsum on multi index pandas dataframe
cumsum on multi index pandas dataframe

Time:03-17

I have a multi index dataset given below:

arr = np.array([12, 12, 12, 72, 72, 72, 26, 26, 26, 22, 22, 22, 46, 46, 46, 32, 32, 32])
df = pd.DataFrame({'date': ['1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000',
                            '1/1/2000', '1/1/2000', '1/1/2000',
                            '2/1/2000', '2/1/2000', '2/1/2000',
                            '3/1/2000', '3/1/2000', '3/1/2000'],
                   'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                            'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
                   'lags': ['31/12/1999', '30/12/1999', '29/12/1999',
                            '1/1/2000', '31/12/1999', '30/12/1999',
                            '2/1/2000', '1/1/2000', '31/12/1999',
                            '31/12/1999', '30/12/1999', '29/12/1999',
                            '1/1/2000', '31/12/1999', '30/12/1999',
                            '2/1/2000', '1/1/2000', '31/12/1999']})
df["target"] = arr
df.set_index(['date', 'type', 'lags'], inplace=True)

I am trying to use cumsum for each type and run the code:

for g_name, g_df in df.groupby("type"):
    df.loc[g_df.index, 'target'] = df.loc[g_df.index, 'target'].cumsum().astype('float64')

However, it applies cumsum for each row and outputted the result:

                          target
date     type lags
1/1/2000 A    31/12/1999    12.0
              30/12/1999    24.0
              29/12/1999    36.0
2/1/2000 A    1/1/2000     108.0
              31/12/1999   180.0
              30/12/1999   252.0
3/1/2000 A    2/1/2000     278.0
              1/1/2000     304.0
              31/12/1999   330.0
1/1/2000 B    31/12/1999    22.0
              30/12/1999    44.0
              29/12/1999    66.0
2/1/2000 B    1/1/2000     112.0
              31/12/1999   158.0
              30/12/1999   204.0
3/1/2000 B    2/1/2000     236.0
              1/1/2000     268.0
              31/12/1999   300.0

The expected result is:

                          target
date     type lags
1/1/2000 A    31/12/1999    12.0
              30/12/1999    12.0
              29/12/1999    12.0
2/1/2000 A    1/1/2000      84.0
              31/12/1999    84.0
              30/12/1999    84.0
3/1/2000 A    2/1/2000      110.0
              1/1/2000      110.0
              31/12/1999    110.0
1/1/2000 B    31/12/1999    22.0
              30/12/1999    22.0
              29/12/1999    22.0
2/1/2000 B    1/1/2000      68.0
              31/12/1999    68.0
              30/12/1999    68.0
3/1/2000 B    2/1/2000      100.0
              1/1/2000      100.0
              31/12/1999    100.0

How can I do get expected results and update the original data in a pythonic way?

CodePudding user response:

You can group by date and type and extract the first value then group by type to compute the cumulative sum. Finally reindex the output to broadcast all values:

>>> df.groupby(['date', 'type']).first().groupby('type').cumsum().reindex(df.index)
                          target
date     type lags              
1/1/2000 A    31/12/1999      12
              30/12/1999      12
              29/12/1999      12
2/1/2000 A    1/1/2000        84
              31/12/1999      84
              30/12/1999      84
3/1/2000 A    2/1/2000       110
              1/1/2000       110
              31/12/1999     110
1/1/2000 B    31/12/1999      22
              30/12/1999      22
              29/12/1999      22
2/1/2000 B    1/1/2000        68
              31/12/1999      68
              30/12/1999      68
3/1/2000 B    2/1/2000       100
              1/1/2000       100
              31/12/1999     100

CodePudding user response:

You could use groupby cumcount first to create groups; then use it in groupby cumsum:

out = df.groupby([df.groupby(level=[0,1]).cumcount(), df.index.get_level_values(1)]).cumsum()

Output:

                          target
date     type lags              
1/1/2000 A    31/12/1999      12
              30/12/1999      12
              29/12/1999      12
2/1/2000 A    1/1/2000        84
              31/12/1999      84
              30/12/1999      84
3/1/2000 A    2/1/2000       110
              1/1/2000       110
              31/12/1999     110
1/1/2000 B    31/12/1999      22
              30/12/1999      22
              29/12/1999      22
2/1/2000 B    1/1/2000        68
              31/12/1999      68
              30/12/1999      68
3/1/2000 B    2/1/2000       100
              1/1/2000       100
              31/12/1999     100
  • Related