i have two data frame
asks_price asks_qty exchange_name_ask bids_price bids_qty exchange_name_bid
0 20156.51 0.000745 Coinbase 20153.28 0.000200 Coinbase
1 20157.52 0.050000 Coinbase 20152.27 0.051000 Coinbase
2 20158.52 0.000745 Coinbase 20151.28 0.000200 Kraken
3 20158.52 0.050000 FTX 20151.28 0.051000 Coinbase
I would like to group the same price, add quantity together and combine the name of the exchange like :
asks_price asks_qty exchange_name_ask bids_price bids_qty exchange_name_bid
0 20156.51 0.000745 Coinbase 20153.28 0.000200 Coinbase
1 20157.52 0.050000 Coinbase 20152.27 0.051000 Coinbase
2 20158.52 0.050745 CoinbaseFTX 20151.28 0.051200 KrakenCoinbase
I succeeded using
df.groupby(['asks_price', 'bids_price']).sum(False)
It's working BUT if this is the same name i dont want to concat it. I dont want CoinbaseCoinbase or KrakenKraken So i have to include an if in the SUM function , how can i do it Thanks
CodePudding user response:
Try as follows:
import pandas as pd
# Changing `2: 'Kraken'` to `2: 'Coinbase' for `exchange_name_bid` to generate
# an example that doesn't require concatenation
data = {'asks_price': {0: 20156.51, 1: 20157.52, 2: 20158.52, 3: 20158.52},
'asks_qty': {0: 0.000745, 1: 0.05, 2: 0.000745, 3: 0.05},
'exchange_name_ask': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase',
3: 'FTX'},
'bids_price': {0: 20153.28, 1: 20152.27, 2: 20151.28, 3: 20151.28},
'bids_qty': {0: 0.0002, 1: 0.051, 2: 0.0002, 3: 0.051},
'exchange_name_bid': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase',
3: 'Coinbase'}}
df = pd.DataFrame(data)
res = df.groupby(['asks_price', 'bids_price'], as_index=False).agg(
{'asks_qty':'sum',
'bids_qty':'sum',
'exchange_name_ask': lambda x: ','.join(dict.fromkeys(x).keys()),
'exchange_name_bid': lambda x: ','.join(dict.fromkeys(x).keys())
})
print(res)
asks_price bids_price asks_qty bids_qty exchange_name_ask \
0 20156.51 20153.28 0.000745 0.0002 Coinbase
1 20157.52 20152.27 0.050000 0.0510 Coinbase
2 20158.52 20151.28 0.050745 0.0512 Coinbase,FTX
exchange_name_bid
0 Coinbase
1 Coinbase
2 Coinbase # no concatenation here
Explanation
- Use
df.groupby.agg
and split the functions to be applied to the different columns. I.e.sum
for columns'*_qty'
, and a lambda function for columns'exchange_name_*'
withdict.fromkeys
nested insidejoin
. (As mentioned by@jezrael
in the comments,dict.fromkeys
has better performance than usingSeries.drop_duplicates
. Cf. hisanswer
elsewhere.) - Change
','.join
to''.join
, if you don't want a delimiter. If you want to re-order the columns ofres
so that they match the order of the originaldf
, useres = res.loc[:,df.columns]
.