Home > Enterprise >  pandas perform division between 2 tables (1 aggregated) with different size
pandas perform division between 2 tables (1 aggregated) with different size

Time:07-17

I am a newbie to pandas and I am struggling to get the result that I want after doing research / experiencing some trial and errors...appreciate any guidance from here, thanks in advance!

Supposed I have a dataframe which holds some trade data, to make it easier to explain what I want to do, I made up these data myself:

| time       | symbol   | market  | volume |
| ---------- | -------- | ------- | ------ |
| 2022-07-01 | btcusdt  | Binance | 300    |
| 2022-07-01 | btcusdt  | Gemini  | 200    |
| 2022-07-01 | btcusdt  | Huobi   | 100    |
| 2022-07-02 | btcusdt  | Binance | 400    |
| 2022-07-02 | btcusdt  | Gemini  | 350    |
| 2022-07-02 | btcusdt  | Huobi   | 100    |
| 2022-07-01 | ethusdt  | Binance | 1000   |
| 2022-07-01 | ethusdt  | Gemini  | 2000   |
| 2022-07-02 | ethusdt  | Binance | 1500   |
| 2022-07-02 | ethusdt  | Gemini  | 500    |
| 2022-07-01 | usdcusdt | Binance | 300    |
| 2022-07-01 | usdcusdt | Gemini  | 1000   |
| 2022-07-01 | usdcusdt | Huobi   | 2000   |
| 2022-07-02 | usdcusdt | Binance | 1500   |
| 2022-07-02 | usdcusdt | Gemini  | 1500   |
| 2022-07-02 | usdcusdt | Huobi   | 1500   |

I want to create a column "prod_vol" in the original dataframe which does the following calculation:

prod_vol = volume / total_volume_trade_on_that_exchange

to get the percentage EACH DAY, example shown as below:

| time       | symbol   | market  | volume | prod_vol                            |
| ---------- | -------- | ------- | ------ | ----------------------------------- |
| 2022-07-01 | btcusdt  | Binance | 300    | = 300/(300 1000 300) * 100 = 18.75% |
| 2022-07-01 | btcusdt  | Gemini  | 200    | = 200/(200 2000 1000) * 100 = 6.25% |
| 2022-07-01 | btcusdt  | Huobi   | 100    | = 100/(100 2000) = 8.33%            |          
| 2022-07-02 | btcusdt  | Binance | 400    | = 400/(400 1500 1500) = 11.76%      |
| 2022-07-02 | btcusdt  | Gemini  | 350    |                                     |
| 2022-07-02 | btcusdt  | Huobi   | 100    |                                     |
| 2022-07-01 | ethusdt  | Binance | 1000   |                                     |
| 2022-07-01 | ethusdt  | Gemini  | 2000   |                                     |
| 2022-07-02 | ethusdt  | Binance | 1500   |                                     |
| 2022-07-02 | ethusdt  | Gemini  | 500    |                                     |
| 2022-07-01 | usdcusdt | Binance | 300    |                                     |
| 2022-07-01 | usdcusdt | Gemini  | 1000   |                                     |
| 2022-07-01 | usdcusdt | Huobi   | 2000   |                                     |
| 2022-07-02 | usdcusdt | Binance | 1500   |                                     |
| 2022-07-02 | usdcusdt | Gemini  | 1500   |                                     |
| 2022-07-02 | usdcusdt | Huobi   | 1500   |                                     |

I am able to get the denominator (i.e. the total volume trade on different exchanges each day) by creating a new dataframe using GroupBy and Aggregate function as shown below:

df_new = df.groupby(['time', 'market']).agg(volume=('volume', 'sum'))

which gives me the aggregated table:

| time       | market  | volume                     |
| ---------- | -----   | -------------------------- |
| 2022-07-01 | Binance | = 300   1000   300 = 1600  |
|            | Gemini  | = 200   2000   1000 = 3200 |
|            | Huobi   | = 100   2000 = 2100        |
| 2022-07-02 | Binance | = 400   1500   1500 = 3400 |
|            | Gemini  | = 350   500   1500 = 2350  |
|            | Huobi   | = 100   1500 = 1600        |

I am having trouble to use these data from the new df_new and perform the calculation to the original df to get the expected output as shown on second table...

Does anyone know how I can achieve this? Thanks!

Edit:

print(df.to_dict())

output:

{'time': {0: '2022-07-01', 1: '2022-07-01', 2: '2022-07-01', 3: '2022-07-02', 4: '2022-07-02', 5: '2022-07-02', 6: '2022-07-01', 7: '2022-07-01', 8: '2022-07-02', 9: '2022-07-02', 10: '2022-07-01', 11: '2022-07-01', 12: '2022-07-01', 13: '2022-07-02', 14: '2022-07-02', 15: '2022-07-02'}, 'symbol': {0: 'btcusdt', 1: 'btcusdt', 2: 'btcusdt', 3: 'btcusdt', 4: 'btcusdt', 5: 'btcusdt', 6: 'ethusdt', 7: 'ethusdt', 8: 'ethusdt', 9: 'ethusdt', 10: 'usdcusdt', 11: 'usdcusdt', 12: 'usdcusdt', 13: 'usdcusdt', 14: 'usdcusdt', 15: 'usdcusdt'}, 'market': {0: 'Binance', 1: 'Gemini', 2: 'Huobi', 3: 'Binance', 4: 'Gemini', 5: 'Huobi', 6: 'Binance', 7: 'Gemini', 8: 'Binance', 9: 'Gemini', 10: 'Binance', 11: 'Gemini', 12: 'Huobi', 13: 'Binance', 14: 'Gemini', 15: 'Huobi'}, 'volume': {0: 300, 1: 200, 2: 100, 3: 400, 4: 350, 5: 100, 6: 1000, 7: 2000, 8: 1500, 9: 500, 10: 300, 11: 1000, 12: 2000, 13: 1500, 14: 1500, 15: 1500}}

CodePudding user response:

IIUC

is that what you're looking for?

df['mkt_vol'] = df.groupby(['market','time'])['volume'].transform('sum')
df['prod_vol'] = round(df['volume']/df['mkt_vol'] * 100, 2)
df

OR, just a single line

df['prod_vol'] = round(df['volume']/df.groupby(['market','time'])['volume'].transform('sum') *100,2)
df

          time  symbol      market    volume    prod_vol
0   2022-07-01  btcusdt     Binance      300    18.75
1   2022-07-01  btcusdt     Gemini       200     6.25
2   2022-07-01  btcusdt     Huobi        100     4.76
3   2022-07-02  btcusdt     Binance      400    11.76
4   2022-07-02  btcusdt     Gemini       350    14.89
5   2022-07-02  btcusdt     Huobi        100     6.25
6   2022-07-01  ethusdt     Binance     1000    62.50
7   2022-07-01  ethusdt     Gemini      2000    62.50
8   2022-07-02  ethusdt     Binance     1500    44.12
9   2022-07-02  ethusdt     Gemini       500    21.28
10  2022-07-01  usdcusdt    Binance      300    18.75
11  2022-07-01  usdcusdt    Gemini      1000    31.25
12  2022-07-01  usdcusdt    Huobi       2000    95.24
13  2022-07-02  usdcusdt    Binance     1500    44.12
14  2022-07-02  usdcusdt    Gemini      1500    63.83
15  2022-07-02  usdcusdt    Huobi       1500    93.75
  • Related