Home > Mobile >  pandas groupby with Multiindex: process other index levels
pandas groupby with Multiindex: process other index levels

Time:10-11

I have dataframe with Multiindex and try to create new column based only on Multiindex data. Example:

df = pd.DataFrame(data = {'ticker': ['AAPL', 'AAPL', 'MSFT', 'MSFT'],
                                'date': [pd.Timestamp.now().floor('D') - pd.Timedelta(days = 1), pd.Timestamp.now().floor('D'),
                                         pd.Timestamp.now().floor('D') - pd.Timedelta(days = 2), pd.Timestamp.now().floor('D') - pd.Timedelta(days = 1)],
                                'price': [100, 95, 200, 150]
                            }
                        ).set_index(['ticker', 'date'])
df
#                       price
# ticker    date        
# AAPL      2022-10-09  100
#           2022-10-10  95
# MSFT      2022-10-08  200
#           2022-10-09  150

Assigning new column with min date for each ticker:

df['date_min'] = df.reset_index().groupby('ticker')['date'].transform(lambda x: x.min())
df
#                       price   date_min
# ticker    date        
# AAPL      2022-10-09  100     NaT
#           2022-10-10  95      NaT
# MSFT      2022-10-08  200     NaT
#           2022-10-09  150     NaT

Somehow column values are not assigned ((

CodePudding user response:

Because you reset_index, you lose index alignment and assignment fails.

You can use the underlying numpy array:

df['date_min'] = (df.reset_index()
                    .groupby('ticker', sort=False)['date']
                    .transform(lambda x: x.min())
                    .to_numpy()
                 )

output:

                   price   date_min
ticker date                        
AAPL   2022-10-09    100 2022-10-09
       2022-10-10     95 2022-10-09
MSFT   2022-10-08    200 2022-10-08
       2022-10-09    150 2022-10-08
  • Related