I have a dataframe with a multi-index the first level is a stock ticker the second is a date such as this:
import pandas as pd
import numpy as np
arrays = [
np.array(["MSFT", "MSFT", "MSFT", "MSFT", "GOOG", "GOOG", "GOOG", "GOOG"]),
np.array([pd.to_datetime("2022-04-05"), pd.to_datetime("2022-04-06"), pd.to_datetime("2022-04-07"), pd.to_datetime("2022-04-08"),pd.to_datetime("2022-04-05"), pd.to_datetime("2022-04-06"), pd.to_datetime("2022-04-07"), pd.to_datetime("2022-04-08")]),
]
df = pd.DataFrame(np.random.randn(8, 1), index=arrays, columns={"P/Sales"})
df
Ticker | Date | P/Sales |
---|---|---|
MSFT | 2022-04-05 00:00:00 | -1.1648041378350011 |
MSFT | 2022-04-06 00:00:00 | -2.352493882690841 |
MSFT | 2022-04-07 00:00:00 | 0.6047960443400636 |
MSFT | 2022-04-08 00:00:00 | 1.533852852883288 |
GOOG | 2022-04-05 00:00:00 | 2.0738583230363195 |
GOOG | 2022-04-06 00:00:00 | -1.887463559617715 |
GOOG | 2022-04-07 00:00:00 | 0.49868465858420324 |
GOOG | 2022-04-08 00:00:00 | 0.900527823711884 |
I want to add another column which would contain the mean of P/Sales for all dates BEFORE the current line for that stock ticker.
Any help would be much appreciated.
Currently I am achieving this with a for loop but am thinking there must be a better way. Perhaps expanding.mean() but I can't seem to get the syntax right to do the mean only on prior dates for that ticker.
for ticker in df.reset_index()['Ticker'].unique():
for date in df.loc[ticker].index:
filtered_df = df.loc[ticker].loc[(df.loc[ticker].index < date)]
mean = np.mean(np.asarray(filtered_df['P/Sales']))
df.loc[(ticker,date),"Mean"] = mean
df
Ticker | Date | P/Sales | Mean |
---|---|---|---|
MSFT | 2022-04-05 00:00:00 | -1.4361186774548342 | NaN |
MSFT | 2022-04-06 00:00:00 | 0.5922672022039192 | -1.4361186774548342 |
MSFT | 2022-04-07 00:00:00 | 0.4844687977324626 | -0.42192573762545754 |
MSFT | 2022-04-08 00:00:00 | -0.4125514467122677 | -0.11979422583948417 |
GOOG | 2022-04-05 00:00:00 | -0.30860020353979284 | NaN |
GOOG | 2022-04-06 00:00:00 | 0.1189685515177034 | -0.30860020353979284 |
GOOG | 2022-04-07 00:00:00 | 1.1730395696068878 | -0.09481582601104471 |
GOOG | 2022-04-08 00:00:00 | -0.9778323004522275 | 0.3278026391949328 |
CodePudding user response:
Use GroupBy.transform
with lambda function with Expanding.mean
and Series.shift
:
for ticker in df.reset_index()['Ticker'].unique():
for date in df.loc[ticker].index:
filtered_df = df.loc[ticker].loc[(df.loc[ticker].index < date)]
print (filtered_df)
mean = np.mean(np.asarray(filtered_df['P/Sales']))
df.loc[(ticker,date),"Mean"] = mean
df['Mean1'] = (df.groupby('Ticker')['P/Sales']
.transform(lambda x: x.expanding().mean().shift()))
print (df)
P/Sales Mean Mean1
Ticker Date
MSFT 2022-04-05 -1.164804 NaN NaN
2022-04-06 -2.352494 -1.164804 -1.164804
2022-04-07 0.604796 -1.758649 -1.758649
2022-04-08 1.533853 -0.970834 -0.970834
GOOG 2022-04-05 2.073858 NaN NaN
2022-04-06 -1.887464 2.073858 2.073858
2022-04-07 0.498685 0.093197 0.093197
2022-04-08 0.900528 0.228360 0.228360