Home > Software design >  How can I aggregate a column's value based on the min() and max() values of another column?
How can I aggregate a column's value based on the min() and max() values of another column?

Time:11-08

I have the following data-frame, df, that is population with 1000 rows of data. Here is what it would look like:

     date        mkt       bid       ask
0    01/07/22    LWAPO     0.6451    0.6460
1    01/07/22    HUYNE     0.6452    0.6458
2    01/07/22    VERAS     0.6447    0.6457
3    02/07/22    HUYNE     0.6432    0.6435
4    03/07/22    LWAPO     0.6440    0.6442
5    03/07/22    VERAS     0.6441    0.6444
6    03/07/22    PLAIN     0.6440    0.6445
7    03/07/22    ALCOT     0.6445    0.6449
8    04/07/22    HUYNE     0.6431    0.6444
9    04/07/22    LWAPO     0.6439    0.6441
...

My goal is to aggregate this date by the date, and perform some analysis using the mean bid/ask prices by each of the market makers, that is 'mkt' column, and finally visualise this data using plotly.

However, I am wanting the 'mkt' column (or it can be a new column if easier) to populate the names of the two market makers with the best bid price (max) and the best ask price (min), as a concatenated string.

As such, it would be something that looks like below:

     date        mkt
0    01/07/22    HUYNE, VERAS
1    02/07/22    HUYNE, HUYNE
2    03/07/22    ALCOT, VERAS
3    04/07/22    LWAPO, LWAPO
...

With the other two columns just being simple averages for the day. I can achieve this through the following code:

new_df = df.groupby('date').mean()

But unsure how to properly apply df.loc[BOOLEAN]` to alter my data-frame and achieve my desired result. I have a vague idea in mind on how I can do it, but I feel like there is a simple solution that I am missing. I have also tried the following to no avail:

for date in df['date'].unique():
    test = df.loc[df['date']==date]['bid'].max()

Apologies for the convoluted nature of my question, but I would appreciate any help :)

CodePudding user response:

get name of bid max and name of ask min

df1 = (df.groupby('date')
       .agg({'bid': lambda x: df.loc[x.idxmax(), 'mkt'],
             'ask': lambda x: df.loc[x.idxmin(), 'mkt']})
       .reset_index())

df1:

    date        bid     ask
0   01/07/22    HUYNE   VERAS
1   02/07/22    HUYNE   HUYNE
2   03/07/22    ALCOT   LWAPO
3   04/07/22    LWAPO   LWAPO



make result to mkt column

df1['mkt'] = df1.pop('bid')   ', '   df1.pop('ask')

df1:

    date        mkt
0   01/07/22    HUYNE, VERAS
1   02/07/22    HUYNE, HUYNE
2   03/07/22    ALCOT, LWAPO
3   04/07/22    LWAPO, LWAPO




if don want make df1 and useless columns, use following code:

(df.groupby('date')
 .agg({'bid': lambda x: df.loc[x.idxmax(), 'mkt'],
       'ask': lambda x: df.loc[x.idxmin(), 'mkt']})
 .reset_index()
 .assign(mkt=lambda x: x.pop('bid')   ', '   x.pop('ask')))

CodePudding user response:

Groupby dates and get the best bid and ask mkt.

bids = df[['date', 'mkt', 'bid']]
asks = df[['date', 'mkt', 'ask']]
best_bid_mkt = bids.groupby('date').max().mkt
best_ask_mkt = asks.groupby('date').min().mkt

Concatenate the best_bid_mkt and best_ask_mkt inside DataFrame.

pd.DataFrame(best_bid_mkt   ', '   best_ask_mkt)

date        mkt
07-01-2022  VERAS, HUYNE
07-02-2022  HUYNE, HUYNE
07-03-2022  VERAS, ALCOT
07-04-2022  LWAPO, HUYNE
  • Related