calculating balance of different assets using payment information in pandas dataframe-CodePudding

I have the following dataframe. What I want is to calculate the balance of all different types of tokens per address by quarter. This requires summing payment (either debit- or credit ) for all addresses by token overtime (from the very start) since I want to observe the evolution of balance of all tokens per address quarterly.

My current df

....
        token_address       address              payment            date        ... more columns ...
11      0x48c80f1f4d53d55   0xdde7f77766434b6fb  6e 17              2016-10-15 
12      0x1e797ce986c3cff   0xda48c91e051ccb1ec -1e 19              2016-12-24 
13      0x1e797ce986c3cff   0xda48c91e051ccb1ec -5e 19              2016-12-24 
14      0x45e42d659d9f946   0x0d0f998d0eacc304c -1541212.0          2016-12-24 
15      0x48c80f1f4d53d59   0xe0a6f3da1e523f2d8 -5.3282558e 19      2016-12-24 
16      0x1e797ce986c3cff   0xda48c91e051ccb1ec  7e 19              2017-02-03 
527     0xedbaf3c5100302d   0xcf62dfc079c5d594c -5205.0             2017-02-03
528     0xedbaf3c5100302d   0xcf62dfc079c5d594c -1245205.0          2017-02-03
529     0xedbaf3c5100302d   0xd9ffae429218a2ff3 -129500000.0        2017-02-03
164207  0xedbaf3c5100302d   0xcf62dfc079c5d594c  1000000.0          2017-09-23
164208  0xf230b790e05390f   0x74f3f5d29f71805c7  27518.0            2017-09-23
164209  0xd26114cd6ee289a   0x8b06df3a4ad049865  7718036541774558.0 2017-09-23
164210  0x1e797ce986c3cff   0xda48c91e051ccb1ec  1e 19              2017-09-23

....

The desired output

address             date   token_address        balance
0xdde7f77766434b6fb 2016Q4 0x48c80f1f4d53d55     6e 17  
0xda48c91e051ccb1ec 2016Q4 0x1e797ce986c3cff    -6e 19
0xda48c91e051ccb1ec 2017Q1 0x1e797ce986c3cff     1e 19
0xda48c91e051ccb1ec 2017Q3 0x1e797ce986c3cff     2e 19
0x0d0f998d0eacc304c 2016Q4 0x45e42d659d9f946    -1541212.0
0xe0a6f3da1e523f2d8 2016Q4 0x48c80f1f4d53d59    -5.3282558e 19
0xcf62dfc079c5d594c 2017Q1 0xedbaf3c5100302d    -1250410
0xcf62dfc079c5d594c 2017Q3 0xedbaf3c5100302d    -250410
0xd9ffae429218a2ff3 2017Q1 0xedbaf3c5100302d    -129500000.0
0x74f3f5d29f71805c7 2017Q3 0xf230b790e05390f    27518.0
0x8b06df3a4ad049865 2017Q3 0xd26114cd6ee289a    7718036541774558.0

any help will be much appreciated!!

CodePudding user response：

You can convert date to quarters using PeriodIndex, then use groupby sum to get the quarterly sum; then groupby cumsum for the final output:

out = (df.groupby(['address', pd.PeriodIndex(df['date'], freq='Q'), 'token_address'])['payment'].sum()
      .groupby(level=[0,2]).cumsum().reset_index(name='balance'))

Output:

                address    date      token_address       balance
0   0x0d0f998d0eacc304c  2016Q4  0x45e42d659d9f946 -1.541212e 06
1   0x74f3f5d29f71805c7  2017Q3  0xf230b790e05390f  2.751800e 04
2   0x8b06df3a4ad049865  2017Q3  0xd26114cd6ee289a  7.718037e 15
3   0xcf62dfc079c5d594c  2017Q1  0xedbaf3c5100302d -1.250410e 06
4   0xcf62dfc079c5d594c  2017Q3  0xedbaf3c5100302d -2.504100e 05
5   0xd9ffae429218a2ff3  2017Q1  0xedbaf3c5100302d -1.295000e 08
6   0xda48c91e051ccb1ec  2016Q4  0x1e797ce986c3cff -6.000000e 19
7   0xda48c91e051ccb1ec  2017Q1  0x1e797ce986c3cff  1.000000e 19
8   0xda48c91e051ccb1ec  2017Q3  0x1e797ce986c3cff  2.000000e 19
9   0xdde7f77766434b6fb  2016Q4  0x48c80f1f4d53d55  6.000000e 17
10  0xe0a6f3da1e523f2d8  2016Q4  0x48c80f1f4d53d59 -5.328256e 19