I have the following data-frame, df
, that is population with 1000 rows of data. Here is what it would look like:
date mkt bid ask
0 01/07/22 LWAPO 0.6451 0.6460
1 01/07/22 HUYNE 0.6452 0.6458
2 01/07/22 VERAS 0.6447 0.6457
3 02/07/22 HUYNE 0.6432 0.6435
4 03/07/22 LWAPO 0.6440 0.6442
5 03/07/22 VERAS 0.6441 0.6444
6 03/07/22 PLAIN 0.6440 0.6445
7 03/07/22 ALCOT 0.6445 0.6449
8 04/07/22 HUYNE 0.6431 0.6444
9 04/07/22 LWAPO 0.6439 0.6441
...
My goal is to aggregate this date by the date, and perform some analysis using the mean bid/ask prices by each of the market makers, that is 'mkt'
column, and finally visualise this data using plotly.
However, I am wanting the 'mkt'
column (or it can be a new column if easier) to populate the names of the two market makers with the best bid price (max) and the best ask price (min), as a concatenated string.
As such, it would be something that looks like below:
date mkt
0 01/07/22 HUYNE, VERAS
1 02/07/22 HUYNE, HUYNE
2 03/07/22 ALCOT, VERAS
3 04/07/22 LWAPO, LWAPO
...
With the other two columns just being simple averages for the day. I can achieve this through the following code:
new_df = df.groupby('date').mean()
But unsure how to properly apply df.loc[BOOLEAN
]` to alter my data-frame and achieve my desired result. I have a vague idea in mind on how I can do it, but I feel like there is a simple solution that I am missing. I have also tried the following to no avail:
for date in df['date'].unique():
test = df.loc[df['date']==date]['bid'].max()
Apologies for the convoluted nature of my question, but I would appreciate any help :)
CodePudding user response:
get name of bid
max and name of ask
min
df1 = (df.groupby('date')
.agg({'bid': lambda x: df.loc[x.idxmax(), 'mkt'],
'ask': lambda x: df.loc[x.idxmin(), 'mkt']})
.reset_index())
df1
:
date bid ask
0 01/07/22 HUYNE VERAS
1 02/07/22 HUYNE HUYNE
2 03/07/22 ALCOT LWAPO
3 04/07/22 LWAPO LWAPO
make result to mkt
column
df1['mkt'] = df1.pop('bid') ', ' df1.pop('ask')
df1
:
date mkt
0 01/07/22 HUYNE, VERAS
1 02/07/22 HUYNE, HUYNE
2 03/07/22 ALCOT, LWAPO
3 04/07/22 LWAPO, LWAPO
if don want make df1
and useless columns, use following code:
(df.groupby('date')
.agg({'bid': lambda x: df.loc[x.idxmax(), 'mkt'],
'ask': lambda x: df.loc[x.idxmin(), 'mkt']})
.reset_index()
.assign(mkt=lambda x: x.pop('bid') ', ' x.pop('ask')))
CodePudding user response:
Groupby dates and get the best bid and ask mkt.
bids = df[['date', 'mkt', 'bid']]
asks = df[['date', 'mkt', 'ask']]
best_bid_mkt = bids.groupby('date').max().mkt
best_ask_mkt = asks.groupby('date').min().mkt
Concatenate the best_bid_mkt
and best_ask_mkt
inside DataFrame.
pd.DataFrame(best_bid_mkt ', ' best_ask_mkt)
date mkt
07-01-2022 VERAS, HUYNE
07-02-2022 HUYNE, HUYNE
07-03-2022 VERAS, ALCOT
07-04-2022 LWAPO, HUYNE