Home > Net >  Filling NaN in each row with the mean of existing values except its NaNs
Filling NaN in each row with the mean of existing values except its NaNs

Time:01-05

I am a beginner in the basics of data analysis and have a problem that I could not solve it with my current knowledge and searching on various sites, especially the StackOverflow site. I have a sample data frame like the one below and I need to fill NaN with a specific condition:

df = pd.DataFrame([['row 1', 1, 2, 3, 4], ['row 2', 5, nan, 7, 8], ['row 3', nan, nan, 11, 12]], index=['A', 'B', 'C'], columns=['rows', '1st', '2nd', '3th', '4th'])
df
rows 1st 2nd 3rd 4th
A 1.0 2.0 3 4
B 5.0 NaN >> fill with (5.0 7 8) / 3 = 6.66... 7 8
C NaN >> fill with (11 12) / 2 = 11.5 NaN >> fill with (11 12) / 2 = 11.5 11 12

In each row that contains NaN, I have some numbers. To fill NaN I have to calculate their values with this formula: (sum of existing numbers in the row) / (number of existing numbers in the row). For example in row 'B', the NaN should fill with the shown formula. Thank you in advance for any advice.

I tried this code in notebook but my data frame didn't changed:

df.fillna((df[df.isna() == False].sum(axis=1)) / (4 - df.isna().sum(axis=1)), inplace=True)

CodePudding user response:

You can compute the mean along the column axis then fill nan values:

mean = df.mean(axis=1, numeric_only=True)
out = df.T.fillna(mean).T

Output:

>>> out
    rows   1st       2nd 3th 4th
A  row 1   1.0       2.0   3   4
B  row 2   5.0  6.666667   7   8
C  row 3  11.5      11.5  11  12

>>> mean
A     2.500000
B     6.666667
C    11.500000
dtype: float64

Update: a shortest solution inspired by @anoushiravan-r:

>>> df.mask(df.isna(), df.mean(axis=1, numeric_only=True), axis=0)
    rows   1st        2nd  3th  4th
A  row 1   1.0   2.000000    3    4
B  row 2   5.0   6.666667    7    8
C  row 3  11.5  11.500000   11   12

CodePudding user response:

You could also try the following solution:

import pandas as pd

df.where(~ df.isna(), df.iloc[:, 1:].mean(axis=1), axis=0)

    rows   1st        2nd  3th  4th
A  row 1   1.0   2.000000    3    4
B  row 2   5.0   6.666667    7    8
C  row 3  11.5  11.500000   11   12

Or you could also try thanks to a very good suggestion by @Corralien:

df.where(~ df.isna(), df.mean(axis=1, numeric_only=True), axis=0)
  • Related