A B C D
1 2010 one 0 0
2 2020 one 2 4
3 2007 two 0 8
4 2010 one 8 4
5 2020 four 6 12
6 2007 three 0 14
7 2006 four 7 14
8 2010 two 10 12
I need to replace 0 with the average of the C values of years.For example 2010 C value would be 9. What is the best way to do this? i have over 10,000 rows.
CodePudding user response:
You can use replace
to change 0's to np.nan
in Column C, and use fillna
to map
the yearly averages:
df.C.replace({0:np.nan},inplace=True)
df.C.fillna(
df.A.map(
df.groupby(df['A']).\
C.mean().fillna(0)\
.to_dict()
),inplace=True
)
print(df)
A B C D
0 2010 one 9.0 0
1 2020 one 2.0 4
2 2007 two 0.0 8
3 2010 one 8.0 4
4 2020 four 6.0 12
5 2007 three 0.0 14
6 2006 four 7.0 14
7 2010 two 10.0 12
2007 is still NaN
because we have no values other than 0's in the initial data.
CodePudding user response:
Here is what I think I will do it. The code below will be pseudo-code.
1: You find the avg for each year, and put it to a dict.
my_year_dict = {'2020':xxx,'2021':xxx}
2: Use apply & lambda functions
df[New C Col] = df[C].apply(lambda x: my_year_dict[x] if x is 0)
Hope it can be a start!