I have dataframe with Multiindex and try to create new column based only on Multiindex data. Example:
df = pd.DataFrame(data = {'ticker': ['AAPL', 'AAPL', 'MSFT', 'MSFT'],
'date': [pd.Timestamp.now().floor('D') - pd.Timedelta(days = 1), pd.Timestamp.now().floor('D'),
pd.Timestamp.now().floor('D') - pd.Timedelta(days = 2), pd.Timestamp.now().floor('D') - pd.Timedelta(days = 1)],
'price': [100, 95, 200, 150]
}
).set_index(['ticker', 'date'])
df
# price
# ticker date
# AAPL 2022-10-09 100
# 2022-10-10 95
# MSFT 2022-10-08 200
# 2022-10-09 150
Assigning new column with min date for each ticker:
df['date_min'] = df.reset_index().groupby('ticker')['date'].transform(lambda x: x.min())
df
# price date_min
# ticker date
# AAPL 2022-10-09 100 NaT
# 2022-10-10 95 NaT
# MSFT 2022-10-08 200 NaT
# 2022-10-09 150 NaT
Somehow column values are not assigned ((
CodePudding user response:
Because you reset_index
, you lose index alignment and assignment fails.
You can use the underlying numpy array:
df['date_min'] = (df.reset_index()
.groupby('ticker', sort=False)['date']
.transform(lambda x: x.min())
.to_numpy()
)
output:
price date_min
ticker date
AAPL 2022-10-09 100 2022-10-09
2022-10-10 95 2022-10-09
MSFT 2022-10-08 200 2022-10-08
2022-10-09 150 2022-10-08