Groupby values, perform calculations and apply to repeating rows-CodePudding

I have the following df:

       wallet      position        position_rewards    position_type    token_sales       
0      0x123       SUSHI_LP        250                 Sushi_LP         500                                                                         
0      0x123           ALCX        750                 LP Token         500
1      0xabc          GAMMA        333.33              LP Token         750
1      0xabc            FXS        666.66              LD               750

Note that the sum of the values in position_rewards for each wallet is the TOTAL for that wallet, and the token_sales column might show a lower amount that was sold from that total amount. you can see that in: wallet 0x123received 1000 rewards in total, but sold only 500.

I want to create the following columns, which are calculations based on the already existing columns. Logic below too:

Column 1: df['position_rewards_pct'] This column is supposed to have the corresponding % of the rewards per positionover the total rewards per wallet.

My code:

df['position_rewards_pct'] = (df['position_rewards'] / sum(df['position_rewards'].apply(Decimal))) * 100

Problem: Outputting NaNs

Column 2: df['token_sales_per_type'] This column is supposed to show how many tokens have been sold (token_sales column) for a given potition_type.

Please note that for each value in the existing token_sales column, each wallet has only that value. That is, you will never have a different value in token_sales for a single wallet.

In th end, this column should show (repeatedly, for every row in position_type, the amount of tokens sold for that specific type. So as rows in position_typerepeat, so will the rows in df['token_sales_per_type'].

Note that all values are in Decimal object form.

Essentially, the structure of the final df should logically be the following:

CodePudding user response：

In trying to formulate a response, I'm finding that the text of your question doesn't quite match up with the data and visualizations you've provided. Perhaps that DataFrame you're showing is a result of a preliminary grouping operation, rather than the underlying data?

In any event, your question is in the general category of split/apply/combine, for which Pandas has many tools, some of which may seem a bit tricky to grasp.

Usually when you want to perform a grouping, and then apply some operation back to the dataset on the basis of what you found in the grouping, you use .groupby() followed by .transform().

.transform() has the wonderful ability to take the result of an aggregation function, and apply it back to every member of the group. The classic example is subtracting the mean() for a group from every value within that group.

Examples using the dataframe you provided:

Group by the wallet and position, sum the rewards

 someDF.groupby(by=["wallet", "position"]).transform(sum)["position_rewards"]

0     250.0
1     750.0
2    333.33
3    666.66

percentage of total (this one doesn't quite make sense in the context of the df you provided, since position column is all unique)

someDF["position_rewards"] / someDF.groupby(by=["wallet", "position"]).transform(sum)["position_rewards"]
0    1.0
1    1.0
2    1.0
3    1.0

Apply the sum of token_sales to each position type

someDF.groupby(by=["position_type"]).transform(sum)["token_sales"]
0     500
1    1250
2    1250
3     750

One final comment on decimal and percentage formatting - best to leave that for display, rather than modifying the data. You can do that with the pandas styler.

CodePudding user response：

For the first question you need to a groupby.transform('sum') and rdiv:

df['position_rewards_pct'] = (df.groupby('wallet')['position_rewards']
                                .transform('sum').rdiv(df['position_rewards'])
                                .mul(100).round(2)
                             )

output:

  wallet  position  position_rewards position_type  token_sales  position_rewards_pct
0  0x123  SUSHI_LP            250.00      Sushi_LP          500                 25.00
0  0x123      ALCX            750.00      LP Token          500                 75.00
1  0xabc     GAMMA            333.33      LP Token          750                 33.33
1  0xabc       FXS            666.66            LD          750                 66.67