I have a df
in which the index is of dtype period[M]
that looks like this:
month | outcome | MKT |
---|---|---|
2020-01 | W | 6 |
2020-01 | W | 4 |
2020-03 | W | NAN |
2020-03 | L | NAN |
2020-02 | L | 4 |
2020-02 | L | 7 |
I want to replace all NAN values of the column MKT
by the average of the values in the column when the month and the outcome are the same. An expected result for these samples is:
month | outcome | MKT |
---|---|---|
2020-01 | W | 6 |
2020-01 | W | 4 |
2020-03 | W | 5 |
2020-03 | L | 5.5 |
2020-02 | L | 4 |
2020-02 | L | 7 |
I have tried the following:
df["MKT"] = df.MKT.fillna(groupby(pd.Grouper(freq="M")).df.MKT.mean())
But I get the error
NameError: name 'groupby' is not defined
I have seen some solutions for the case of datetype, but I have dtype period[M]
.
CodePudding user response:
replace all NAN values of the column MKT by the average of the values in the column when the month and the outcome are the same
This sounds like you are looking for
df.MKT = df.MKT.fillna(df.groupby(["month", "outcome"]).MKT.transform("mean"))
but your expected output looks like
df.MKT = df.MKT.fillna(df.groupby("outcome").MKT.transform("mean"))