I have the following dataframe. What I want is to calculate the balance of all different types of tokens per address by quarter. This requires summing payment (either debit- or credit ) for all addresses by token overtime (from the very start) since I want to observe the evolution of balance of all tokens per address quarterly.
My current df
....
token_address address payment date ... more columns ...
11 0x48c80f1f4d53d55 0xdde7f77766434b6fb 6e 17 2016-10-15
12 0x1e797ce986c3cff 0xda48c91e051ccb1ec -1e 19 2016-12-24
13 0x1e797ce986c3cff 0xda48c91e051ccb1ec -5e 19 2016-12-24
14 0x45e42d659d9f946 0x0d0f998d0eacc304c -1541212.0 2016-12-24
15 0x48c80f1f4d53d59 0xe0a6f3da1e523f2d8 -5.3282558e 19 2016-12-24
16 0x1e797ce986c3cff 0xda48c91e051ccb1ec 7e 19 2017-02-03
527 0xedbaf3c5100302d 0xcf62dfc079c5d594c -5205.0 2017-02-03
528 0xedbaf3c5100302d 0xcf62dfc079c5d594c -1245205.0 2017-02-03
529 0xedbaf3c5100302d 0xd9ffae429218a2ff3 -129500000.0 2017-02-03
164207 0xedbaf3c5100302d 0xcf62dfc079c5d594c 1000000.0 2017-09-23
164208 0xf230b790e05390f 0x74f3f5d29f71805c7 27518.0 2017-09-23
164209 0xd26114cd6ee289a 0x8b06df3a4ad049865 7718036541774558.0 2017-09-23
164210 0x1e797ce986c3cff 0xda48c91e051ccb1ec 1e 19 2017-09-23
....
The desired output
address date token_address balance
0xdde7f77766434b6fb 2016Q4 0x48c80f1f4d53d55 6e 17
0xda48c91e051ccb1ec 2016Q4 0x1e797ce986c3cff -6e 19
0xda48c91e051ccb1ec 2017Q1 0x1e797ce986c3cff 1e 19
0xda48c91e051ccb1ec 2017Q3 0x1e797ce986c3cff 2e 19
0x0d0f998d0eacc304c 2016Q4 0x45e42d659d9f946 -1541212.0
0xe0a6f3da1e523f2d8 2016Q4 0x48c80f1f4d53d59 -5.3282558e 19
0xcf62dfc079c5d594c 2017Q1 0xedbaf3c5100302d -1250410
0xcf62dfc079c5d594c 2017Q3 0xedbaf3c5100302d -250410
0xd9ffae429218a2ff3 2017Q1 0xedbaf3c5100302d -129500000.0
0x74f3f5d29f71805c7 2017Q3 0xf230b790e05390f 27518.0
0x8b06df3a4ad049865 2017Q3 0xd26114cd6ee289a 7718036541774558.0
any help will be much appreciated!!
CodePudding user response:
You can convert date
to quarters using PeriodIndex
, then use groupby
sum
to get the quarterly sum; then groupby
cumsum
for the final output:
out = (df.groupby(['address', pd.PeriodIndex(df['date'], freq='Q'), 'token_address'])['payment'].sum()
.groupby(level=[0,2]).cumsum().reset_index(name='balance'))
Output:
address date token_address balance
0 0x0d0f998d0eacc304c 2016Q4 0x45e42d659d9f946 -1.541212e 06
1 0x74f3f5d29f71805c7 2017Q3 0xf230b790e05390f 2.751800e 04
2 0x8b06df3a4ad049865 2017Q3 0xd26114cd6ee289a 7.718037e 15
3 0xcf62dfc079c5d594c 2017Q1 0xedbaf3c5100302d -1.250410e 06
4 0xcf62dfc079c5d594c 2017Q3 0xedbaf3c5100302d -2.504100e 05
5 0xd9ffae429218a2ff3 2017Q1 0xedbaf3c5100302d -1.295000e 08
6 0xda48c91e051ccb1ec 2016Q4 0x1e797ce986c3cff -6.000000e 19
7 0xda48c91e051ccb1ec 2017Q1 0x1e797ce986c3cff 1.000000e 19
8 0xda48c91e051ccb1ec 2017Q3 0x1e797ce986c3cff 2.000000e 19
9 0xdde7f77766434b6fb 2016Q4 0x48c80f1f4d53d55 6.000000e 17
10 0xe0a6f3da1e523f2d8 2016Q4 0x48c80f1f4d53d59 -5.328256e 19