I am a beginner in the basics of data analysis and have a problem that I could not solve it with my current knowledge and searching on various sites, especially the StackOverflow site. I have a sample data frame like the one below and I need to fill NaN with a specific condition:
df = pd.DataFrame([['row 1', 1, 2, 3, 4], ['row 2', 5, nan, 7, 8], ['row 3', nan, nan, 11, 12]], index=['A', 'B', 'C'], columns=['rows', '1st', '2nd', '3th', '4th'])
df
rows | 1st | 2nd | 3rd | 4th |
---|---|---|---|---|
A | 1.0 | 2.0 | 3 | 4 |
B | 5.0 | NaN >> fill with (5.0 7 8) / 3 = 6.66... | 7 | 8 |
C | NaN >> fill with (11 12) / 2 = 11.5 | NaN >> fill with (11 12) / 2 = 11.5 | 11 | 12 |
In each row that contains NaN, I have some numbers. To fill NaN I have to calculate their values with this formula: (sum of existing numbers in the row) / (number of existing numbers in the row). For example in row 'B', the NaN should fill with the shown formula. Thank you in advance for any advice.
I tried this code in notebook but my data frame didn't changed:
df.fillna((df[df.isna() == False].sum(axis=1)) / (4 - df.isna().sum(axis=1)), inplace=True)
CodePudding user response:
You can compute the mean along the column axis then fill nan values:
mean = df.mean(axis=1, numeric_only=True)
out = df.T.fillna(mean).T
Output:
>>> out
rows 1st 2nd 3th 4th
A row 1 1.0 2.0 3 4
B row 2 5.0 6.666667 7 8
C row 3 11.5 11.5 11 12
>>> mean
A 2.500000
B 6.666667
C 11.500000
dtype: float64
Update: a shortest solution inspired by @anoushiravan-r:
>>> df.mask(df.isna(), df.mean(axis=1, numeric_only=True), axis=0)
rows 1st 2nd 3th 4th
A row 1 1.0 2.000000 3 4
B row 2 5.0 6.666667 7 8
C row 3 11.5 11.500000 11 12
CodePudding user response:
You could also try the following solution:
import pandas as pd
df.where(~ df.isna(), df.iloc[:, 1:].mean(axis=1), axis=0)
rows 1st 2nd 3th 4th
A row 1 1.0 2.000000 3 4
B row 2 5.0 6.666667 7 8
C row 3 11.5 11.500000 11 12
Or you could also try thanks to a very good suggestion by @Corralien:
df.where(~ df.isna(), df.mean(axis=1, numeric_only=True), axis=0)