Is it possible without using parallelization (Swifter, Parallel) to make an instant calculation immediately without passing through the index, for example through the use of the "apply"-function for all dataset?
%%time
import random
df = pd.DataFrame({'A':random.sample(range(200), 200)})
for j in range(200):
for i in df.index:
df.loc[i,'A_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'A'].mean()
CodePudding user response:
%%time
import random
df = pd.DataFrame({'A':random.sample(range(200), 200)})
First calculate the sums.
df[1] = df['A'].shift()
for j in range(2, 200):
df[j] = df[j-1].fillna(0) df['A'].shift(j)
Then do the division for means and take care of the formatting
df = df.set_index('A')
df.divide(df.columns, axis=1)\
.fillna(method='ffill', axis=1)\
.rename(lambda x: f'A_last_{x}', axis=1)\
.reset_index()