calculating expanding mean in pandas with date multiindex-CodePudding

I have a dataframe with a multi-index the first level is a stock ticker the second is a date such as this:

import pandas as pd
import numpy as np

arrays = [
    np.array(["MSFT", "MSFT", "MSFT", "MSFT", "GOOG", "GOOG", "GOOG", "GOOG"]),
    np.array([pd.to_datetime("2022-04-05"), pd.to_datetime("2022-04-06"), pd.to_datetime("2022-04-07"), pd.to_datetime("2022-04-08"),pd.to_datetime("2022-04-05"), pd.to_datetime("2022-04-06"), pd.to_datetime("2022-04-07"), pd.to_datetime("2022-04-08")]),
]


df = pd.DataFrame(np.random.randn(8, 1), index=arrays, columns={"P/Sales"})
df

Ticker	Date	P/Sales
MSFT	2022-04-05 00:00:00	-1.1648041378350011
MSFT	2022-04-06 00:00:00	-2.352493882690841
MSFT	2022-04-07 00:00:00	0.6047960443400636
MSFT	2022-04-08 00:00:00	1.533852852883288
GOOG	2022-04-05 00:00:00	2.0738583230363195
GOOG	2022-04-06 00:00:00	-1.887463559617715
GOOG	2022-04-07 00:00:00	0.49868465858420324
GOOG	2022-04-08 00:00:00	0.900527823711884

I want to add another column which would contain the mean of P/Sales for all dates BEFORE the current line for that stock ticker.

Any help would be much appreciated.

Currently I am achieving this with a for loop but am thinking there must be a better way. Perhaps expanding.mean() but I can't seem to get the syntax right to do the mean only on prior dates for that ticker.

for ticker in  df.reset_index()['Ticker'].unique():    
    for date in df.loc[ticker].index:
      filtered_df = df.loc[ticker].loc[(df.loc[ticker].index < date)]
      mean = np.mean(np.asarray(filtered_df['P/Sales']))
      df.loc[(ticker,date),"Mean"] = mean

df

Ticker	Date	P/Sales	Mean
MSFT	2022-04-05 00:00:00	-1.4361186774548342	NaN
MSFT	2022-04-06 00:00:00	0.5922672022039192	-1.4361186774548342
MSFT	2022-04-07 00:00:00	0.4844687977324626	-0.42192573762545754
MSFT	2022-04-08 00:00:00	-0.4125514467122677	-0.11979422583948417
GOOG	2022-04-05 00:00:00	-0.30860020353979284	NaN
GOOG	2022-04-06 00:00:00	0.1189685515177034	-0.30860020353979284
GOOG	2022-04-07 00:00:00	1.1730395696068878	-0.09481582601104471
GOOG	2022-04-08 00:00:00	-0.9778323004522275	0.3278026391949328

CodePudding user response：

Use GroupBy.transform with lambda function with Expanding.mean and Series.shift:

for ticker in  df.reset_index()['Ticker'].unique():    
    for date in df.loc[ticker].index:
      filtered_df = df.loc[ticker].loc[(df.loc[ticker].index < date)]
      print (filtered_df)
      mean = np.mean(np.asarray(filtered_df['P/Sales']))
      df.loc[(ticker,date),"Mean"] = mean


df['Mean1'] = (df.groupby('Ticker')['P/Sales']
                 .transform(lambda x: x.expanding().mean().shift()))

print (df)
                    P/Sales      Mean     Mean1
Ticker Date                                    
MSFT   2022-04-05 -1.164804       NaN       NaN
       2022-04-06 -2.352494 -1.164804 -1.164804
       2022-04-07  0.604796 -1.758649 -1.758649
       2022-04-08  1.533853 -0.970834 -0.970834
GOOG   2022-04-05  2.073858       NaN       NaN
       2022-04-06 -1.887464  2.073858  2.073858
       2022-04-07  0.498685  0.093197  0.093197
       2022-04-08  0.900528  0.228360  0.228360