Home > Blockchain >  Pandas: Fill nan values in multiple columns with respective median values but accessing the columns
Pandas: Fill nan values in multiple columns with respective median values but accessing the columns

Time:06-27

I have a DataFrame with 15 columns and 5000 rows. In the DataFrame there are 4 columns that contain NaN values. I would like to replace the values with the median.

As there are several columns, I would like to do this via a for-loop. These are the column numbers: 1,5,8,9. The NaN values per column get the corresponding median.

I tried:

for i in [1,5,8,9]:
    df[i] = df[i].fillna(df[i].transform('median'))

CodePudding user response:

No need for a loop, use a vectorial approach:

out = df.fillna(df.median())

Or, to limit to specific columns names:

cols = [1, 5, 8, 9]
# or automatic selection of columns with NaNs
# cols = df.isna().any()

out = df.fillna(df[cols].median())

or positional indices:

col_pos = [1, 5, 8, 9]
out = df.fillna(df.iloc[:, col_pos].median())

output:

   0    1  2    3    4  5  6    7    8  9
0  9  7.0  1  3.0  5.0  7  3  6.0  6.0  7
1  9  1.0  9  6.0  4.5  3  8  4.0  1.0  4
2  5  3.5  3  1.0  4.0  4  4  3.5  3.0  8
3  4  6.0  9  3.0  3.0  2  1  2.0  1.0  3
4  4  1.0  1  3.0  7.0  8  4  3.0  5.0  6

used example input:

   0    1  2    3    4  5  6    7    8  9
0  9  7.0  1  3.0  5.0  7  3  6.0  6.0  7
1  9  1.0  9  6.0  NaN  3  8  4.0  1.0  4
2  5  NaN  3  1.0  4.0  4  4  NaN  NaN  8
3  4  6.0  9  3.0  3.0  2  1  2.0  1.0  3
4  4  1.0  1  NaN  7.0  8  4  3.0  5.0  6

CodePudding user response:

You can simply do:

df[[1,5,8,9]] = df[[1,5,8,9]].fillna(df[[1,5,8,9]].median())
  • Related