I have the following pandas DF:
val
1 10
2 20
3 30
4 40
5 30
I want to get two output columns: avg and avg_sep
avg should be the average calculated row by row.
avg_sep should be the average calculated row by row until a certain condition (i.e. until row 3 I calculate one average, before row 3 I start calculating another average), my expected output is:
val avg avg_sep
1 10 10 10
2 20 15 15
3 30 20 20
4 40 25 40
5 30 26 35
I know I can use df.mean(axis=0)
to get the average of the column. But how can I get the expected output?
CodePudding user response:
From the discussion in the comments:
import pandas as pd
import numpy as np
# Building frame:
df = pd.DataFrame(
data={"val": [10, 20, 30, 40, 30]},
index=[1, 2, 3, 4, 5]
)
# Solution:
df["avg"] = df["val"].cumsum() / np.arange(1, 6) # or `/ df.index`
df.loc[:3, "avg_sep"] = df.loc[:3, "val"].cumsum() / np.arange(1, 4)
df.loc[4:, "avg_sep"] = df.loc[4:, "val"].cumsum() / np.arange(1, 3)