I have the following df
.
wallet position position_rewards position_type token_transfers total_position_rewards rewards_pct transfers_per_rewards
0 0x123 FRAX 700 LD 450 1000 0.7 315
0 0x123 USDC 200 LD 450 1000 0.2 90
0 0x123 ETH 100 Sushi_LP 450 1000 0.1 45
1 0xabc FXS 40 LD 5 50 0.8 4
1 0xabc ETH 10 Sushi_LP 5 50 0.2 1
A few notes about the data:
- The values in the columns
total_transfers
andtotal_position_rewards
are "aggregated" by wallet. That is, each wallet has only one value for that column, even though the wallet is repeated throughout thedf
(because other values in the other columns differ).
What I want to output:
I am interested in creating a table/showing/charting the amount of token_transfers
per position_type
.
One wallet can have different positions
, but the position_type
can be the same (see wallet 0x123
).
total_position_rewards
is a sum of all position_rewards
for a given wallet and rewards_pct
is basically position_rewards
divided by total_position_rewards
.
What I want to calculate is the total_token_transfers_per_position_type
. Which means that I would need to group token_transfers
per position_type
, but since the values token_transfers
are only one for each wallet, I cannot do df.groupby(['wallet', 'position_type'])['token_transfers'].sum()
because I do not want the sum
of token_transfers
, but its sum
across wallet
and positon_type
(and NOT sum these values within the wallet).
My code:
df.groupby(['wallet', 'position_type'])['token_transfers'].sum()
As stated above, I need ot somehowe ignore values being summed across wallet, only across position_type
.
For example, if you do the above operation, I would get the following results for total_token_transfers_per_position_type
:
LD: 450 450 5 = 905
Sushi_LP: 450 5 = 455
however, the correct result should be:
LD: 450 5 = 455
Sushi_LP: 450 5 = 455
Basically, I need to sum token_transfers
across each position_type
, but not summing them up within wallets.
If I do a df.groupby('wallet')['token_transfers'].sum()
and one wallet has mulitple positon_types
, I will get wrongly summed values.
If I do a df.groupby('position_type')['token_transfers'].sum()
pandas
will wrongly sum the values within each wallet.
CodePudding user response:
I don't understand your idea but you can always use for
-loop to work with every group separatelly and later use set()
to reduce duplicated
Something like this:
text = '''wallet position position_rewards position_type token_transfers total_position_rewards rewards_pct transfers_per_rewards
0x123 FRAX 700 LD 450 1000 0.7 315
0x123 USDC 200 LD 450 1000 0.2 90
0x123 ETH 100 Sushi_LP 450 1000 0.1 45
0xabc FXS 40 LD 5 50 0.8 4
0xabc ETH 10 Sushi_LP 5 50 0.2 1'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep='\s ')
#print(df)
for key, val in df.groupby(['position_type']):
print('key:', key)
print('token_transfers:', val['token_transfers'].to_list())
print('set:', set(val['token_transfers']))
print('sum:', sum(set(val['token_transfers'])))
print('---')
Result:
key: LD
token_transfers: [450, 450, 5]
set: {450, 5}
sum: 455
---
key: Sushi_LP
token_transfers: [450, 5]
set: {450, 5}
sum: 455
---
Maybe it could be coded using some .map
or .agg
on group but using for
-loop you have simpler access to groups and you can simpler use standard functions from Python.
EDIT:
Using apply()
I can execute sum(set(x))
print( df.groupby('position_type')['token_transfers'].apply(lambda x:sum(set(x))) )
or .apply(set).apply(sum)
(because .apply(set).sum()
doesn't work)
print( df.groupby('position_type')['token_transfers'].apply(set).apply(sum) )
Result:
position_type
LD 455
Sushi_LP 455
Name: token_transfers, dtype: int64