I have a DataFrame with 15 columns and 5000 rows. In the DataFrame there are 4 columns that contain NaN values. I would like to replace the values with the median.
As there are several columns, I would like to do this via a for-loop. These are the column numbers: 1,5,8,9. The NaN values per column get the corresponding median.
I tried:
for i in [1,5,8,9]:
df[i] = df[i].fillna(df[i].transform('median'))
CodePudding user response:
No need for a loop, use a vectorial approach:
out = df.fillna(df.median())
Or, to limit to specific columns names:
cols = [1, 5, 8, 9]
# or automatic selection of columns with NaNs
# cols = df.isna().any()
out = df.fillna(df[cols].median())
or positional indices:
col_pos = [1, 5, 8, 9]
out = df.fillna(df.iloc[:, col_pos].median())
output:
0 1 2 3 4 5 6 7 8 9
0 9 7.0 1 3.0 5.0 7 3 6.0 6.0 7
1 9 1.0 9 6.0 4.5 3 8 4.0 1.0 4
2 5 3.5 3 1.0 4.0 4 4 3.5 3.0 8
3 4 6.0 9 3.0 3.0 2 1 2.0 1.0 3
4 4 1.0 1 3.0 7.0 8 4 3.0 5.0 6
used example input:
0 1 2 3 4 5 6 7 8 9
0 9 7.0 1 3.0 5.0 7 3 6.0 6.0 7
1 9 1.0 9 6.0 NaN 3 8 4.0 1.0 4
2 5 NaN 3 1.0 4.0 4 4 NaN NaN 8
3 4 6.0 9 3.0 3.0 2 1 2.0 1.0 3
4 4 1.0 1 NaN 7.0 8 4 3.0 5.0 6
CodePudding user response:
You can simply do:
df[[1,5,8,9]] = df[[1,5,8,9]].fillna(df[[1,5,8,9]].median())