replacing the missing value with different values on the same column in pandas dataframe-CodePudding

     A      B     C   D
1  2010    one    0   0
2  2020    one    2   4
3  2007    two    0   8
4  2010    one    8   4
5  2020    four   6  12
6  2007    three  0  14
7  2006    four   7  14
8  2010    two    10 12

I need to replace 0 with the average of the C values of years.For example 2010 C value would be 9. What is the best way to do this? i have over 10,000 rows.

CodePudding user response：

You can use replace to change 0's to np.nan in Column C, and use fillna to map the yearly averages:

df.C.replace({0:np.nan},inplace=True)

df.C.fillna(
    df.A.map(
        df.groupby(df['A']).\
            C.mean().fillna(0)\
            .to_dict()
        ),inplace=True
    )

print(df)

      A      B     C   D
0  2010    one   9.0   0
1  2020    one   2.0   4
2  2007    two   0.0   8
3  2010    one   8.0   4
4  2020   four   6.0  12
5  2007  three   0.0  14
6  2006   four   7.0  14
7  2010    two  10.0  12

2007 is still NaN because we have no values other than 0's in the initial data.

CodePudding user response：

Here is what I think I will do it. The code below will be pseudo-code.

1: You find the avg for each year, and put it to a dict.

my_year_dict = {'2020':xxx,'2021':xxx}

2: Use apply & lambda functions

df[New C Col] = df[C].apply(lambda x: my_year_dict[x] if x is 0)

Hope it can be a start!