I am a newbie to pandas and I am struggling to get the result that I want after doing research / experiencing some trial and errors...appreciate any guidance from here, thanks in advance!
Supposed I have a dataframe which holds some trade data, to make it easier to explain what I want to do, I made up these data myself:
| time | symbol | market | volume |
| ---------- | -------- | ------- | ------ |
| 2022-07-01 | btcusdt | Binance | 300 |
| 2022-07-01 | btcusdt | Gemini | 200 |
| 2022-07-01 | btcusdt | Huobi | 100 |
| 2022-07-02 | btcusdt | Binance | 400 |
| 2022-07-02 | btcusdt | Gemini | 350 |
| 2022-07-02 | btcusdt | Huobi | 100 |
| 2022-07-01 | ethusdt | Binance | 1000 |
| 2022-07-01 | ethusdt | Gemini | 2000 |
| 2022-07-02 | ethusdt | Binance | 1500 |
| 2022-07-02 | ethusdt | Gemini | 500 |
| 2022-07-01 | usdcusdt | Binance | 300 |
| 2022-07-01 | usdcusdt | Gemini | 1000 |
| 2022-07-01 | usdcusdt | Huobi | 2000 |
| 2022-07-02 | usdcusdt | Binance | 1500 |
| 2022-07-02 | usdcusdt | Gemini | 1500 |
| 2022-07-02 | usdcusdt | Huobi | 1500 |
I want to create a column "prod_vol" in the original dataframe which does the following calculation:
prod_vol = volume / total_volume_trade_on_that_exchange
to get the percentage EACH DAY, example shown as below:
| time | symbol | market | volume | prod_vol |
| ---------- | -------- | ------- | ------ | ----------------------------------- |
| 2022-07-01 | btcusdt | Binance | 300 | = 300/(300 1000 300) * 100 = 18.75% |
| 2022-07-01 | btcusdt | Gemini | 200 | = 200/(200 2000 1000) * 100 = 6.25% |
| 2022-07-01 | btcusdt | Huobi | 100 | = 100/(100 2000) = 8.33% |
| 2022-07-02 | btcusdt | Binance | 400 | = 400/(400 1500 1500) = 11.76% |
| 2022-07-02 | btcusdt | Gemini | 350 | |
| 2022-07-02 | btcusdt | Huobi | 100 | |
| 2022-07-01 | ethusdt | Binance | 1000 | |
| 2022-07-01 | ethusdt | Gemini | 2000 | |
| 2022-07-02 | ethusdt | Binance | 1500 | |
| 2022-07-02 | ethusdt | Gemini | 500 | |
| 2022-07-01 | usdcusdt | Binance | 300 | |
| 2022-07-01 | usdcusdt | Gemini | 1000 | |
| 2022-07-01 | usdcusdt | Huobi | 2000 | |
| 2022-07-02 | usdcusdt | Binance | 1500 | |
| 2022-07-02 | usdcusdt | Gemini | 1500 | |
| 2022-07-02 | usdcusdt | Huobi | 1500 | |
I am able to get the denominator (i.e. the total volume trade on different exchanges each day) by creating a new dataframe using GroupBy and Aggregate function as shown below:
df_new = df.groupby(['time', 'market']).agg(volume=('volume', 'sum'))
which gives me the aggregated table:
| time | market | volume |
| ---------- | ----- | -------------------------- |
| 2022-07-01 | Binance | = 300 1000 300 = 1600 |
| | Gemini | = 200 2000 1000 = 3200 |
| | Huobi | = 100 2000 = 2100 |
| 2022-07-02 | Binance | = 400 1500 1500 = 3400 |
| | Gemini | = 350 500 1500 = 2350 |
| | Huobi | = 100 1500 = 1600 |
I am having trouble to use these data from the new df_new and perform the calculation to the original df to get the expected output as shown on second table...
Does anyone know how I can achieve this? Thanks!
Edit:
print(df.to_dict())
output:
{'time': {0: '2022-07-01', 1: '2022-07-01', 2: '2022-07-01', 3: '2022-07-02', 4: '2022-07-02', 5: '2022-07-02', 6: '2022-07-01', 7: '2022-07-01', 8: '2022-07-02', 9: '2022-07-02', 10: '2022-07-01', 11: '2022-07-01', 12: '2022-07-01', 13: '2022-07-02', 14: '2022-07-02', 15: '2022-07-02'}, 'symbol': {0: 'btcusdt', 1: 'btcusdt', 2: 'btcusdt', 3: 'btcusdt', 4: 'btcusdt', 5: 'btcusdt', 6: 'ethusdt', 7: 'ethusdt', 8: 'ethusdt', 9: 'ethusdt', 10: 'usdcusdt', 11: 'usdcusdt', 12: 'usdcusdt', 13: 'usdcusdt', 14: 'usdcusdt', 15: 'usdcusdt'}, 'market': {0: 'Binance', 1: 'Gemini', 2: 'Huobi', 3: 'Binance', 4: 'Gemini', 5: 'Huobi', 6: 'Binance', 7: 'Gemini', 8: 'Binance', 9: 'Gemini', 10: 'Binance', 11: 'Gemini', 12: 'Huobi', 13: 'Binance', 14: 'Gemini', 15: 'Huobi'}, 'volume': {0: 300, 1: 200, 2: 100, 3: 400, 4: 350, 5: 100, 6: 1000, 7: 2000, 8: 1500, 9: 500, 10: 300, 11: 1000, 12: 2000, 13: 1500, 14: 1500, 15: 1500}}
CodePudding user response:
IIUC
is that what you're looking for?
df['mkt_vol'] = df.groupby(['market','time'])['volume'].transform('sum')
df['prod_vol'] = round(df['volume']/df['mkt_vol'] * 100, 2)
df
OR, just a single line
df['prod_vol'] = round(df['volume']/df.groupby(['market','time'])['volume'].transform('sum') *100,2)
df
time symbol market volume prod_vol
0 2022-07-01 btcusdt Binance 300 18.75
1 2022-07-01 btcusdt Gemini 200 6.25
2 2022-07-01 btcusdt Huobi 100 4.76
3 2022-07-02 btcusdt Binance 400 11.76
4 2022-07-02 btcusdt Gemini 350 14.89
5 2022-07-02 btcusdt Huobi 100 6.25
6 2022-07-01 ethusdt Binance 1000 62.50
7 2022-07-01 ethusdt Gemini 2000 62.50
8 2022-07-02 ethusdt Binance 1500 44.12
9 2022-07-02 ethusdt Gemini 500 21.28
10 2022-07-01 usdcusdt Binance 300 18.75
11 2022-07-01 usdcusdt Gemini 1000 31.25
12 2022-07-01 usdcusdt Huobi 2000 95.24
13 2022-07-02 usdcusdt Binance 1500 44.12
14 2022-07-02 usdcusdt Gemini 1500 63.83
15 2022-07-02 usdcusdt Huobi 1500 93.75