I have the following df
:
wallet position position_rewards position_type token_sales
0 0x123 SUSHI_LP 250 Sushi_LP 500
0 0x123 ALCX 750 LP Token 500
1 0xabc GAMMA 333.33 LP Token 750
1 0xabc FXS 666.66 LD 750
Note that the sum of the values in position_rewards
for each wallet is the TOTAL for that wallet, and the token_sales
column might show a lower amount that was sold from that total amount. you can see that in: wallet 0x123
received 1000 rewards in total, but sold only 500.
I want to create the following columns, which are calculations based on the already existing columns. Logic below too:
Column 1: df['position_rewards_pct']
This column is supposed to have the corresponding % of the rewards per position
over the total rewards per wallet
.
My code:
df['position_rewards_pct'] = (df['position_rewards'] / sum(df['position_rewards'].apply(Decimal))) * 100
Problem: Outputting NaNs
Column 2: df['token_sales_per_type']
This column is supposed to show how many tokens have been sold (token_sales
column) for a given potition_type
.
Please note that for each value in the existing token_sales
column, each wallet
has only that value. That is, you will never have a different value in token_sales
for a single wallet
.
In th end, this column should show (repeatedly, for every row in position_type
, the amount of tokens sold for that specific type. So as rows in position_type
repeat, so will the rows in df['token_sales_per_type']
.
Note that all values are in Decimal
object form.
Essentially, the structure of the final df
should logically be the following:
CodePudding user response:
In trying to formulate a response, I'm finding that the text of your question doesn't quite match up with the data and visualizations you've provided. Perhaps that DataFrame you're showing is a result of a preliminary grouping operation, rather than the underlying data?
In any event, your question is in the general category of split/apply/combine, for which Pandas has many tools, some of which may seem a bit tricky to grasp.
Usually when you want to perform a grouping, and then apply some operation back to the dataset on the basis of what you found in the grouping, you use .groupby() followed by .transform().
.transform() has the wonderful ability to take the result of an aggregation function, and apply it back to every member of the group. The classic example is subtracting the mean() for a group from every value within that group.
Examples using the dataframe you provided:
Group by the wallet and position, sum the rewards
someDF.groupby(by=["wallet", "position"]).transform(sum)["position_rewards"]
0 250.0
1 750.0
2 333.33
3 666.66
percentage of total (this one doesn't quite make sense in the context of the df you provided, since position column is all unique)
someDF["position_rewards"] / someDF.groupby(by=["wallet", "position"]).transform(sum)["position_rewards"]
0 1.0
1 1.0
2 1.0
3 1.0
Apply the sum of token_sales to each position type
someDF.groupby(by=["position_type"]).transform(sum)["token_sales"]
0 500
1 1250
2 1250
3 750
One final comment on decimal and percentage formatting - best to leave that for display, rather than modifying the data. You can do that with the pandas styler.
CodePudding user response:
For the first question you need to a groupby.transform('sum')
and rdiv
:
df['position_rewards_pct'] = (df.groupby('wallet')['position_rewards']
.transform('sum').rdiv(df['position_rewards'])
.mul(100).round(2)
)
output:
wallet position position_rewards position_type token_sales position_rewards_pct
0 0x123 SUSHI_LP 250.00 Sushi_LP 500 25.00
0 0x123 ALCX 750.00 LP Token 500 75.00
1 0xabc GAMMA 333.33 LP Token 750 33.33
1 0xabc FXS 666.66 LD 750 66.67