Home > OS >  Python: fill NaN in dataframe with random values picked from the same column
Python: fill NaN in dataframe with random values picked from the same column

Time:01-08

I have a dataframe with some NaN values like the one below and I would like to fill in the nan values in a column with random picks from the same column. e.g. randomly pick values from Col1 to fill in the NaN-values in Col1

   Col1      Col2      Col3      Col4   Col5
0  -0.671603 -0.792415  0.783922 NaN    Blue
1   0.207720       NaN  0.996131 Tom    Yellow
2  -0.892115 -1.282333       NaN Julia  NaN
3  -0.315598 -2.371529 -1.959646 NaN    Pink
4        NaN       NaN -0.584636 NaN    Orange
5   0.314736 -0.692732 -0.303951 Jim    NaN
6   0.355121       NaN       NaN NaN    Red
7        NaN -1.900148  1.230828 Sophia NaN
8  -1.795468  0.490953       NaN Anne   Blue
9  -0.678491 -0.087815       NaN NaN    NaN
10  0.755714  0.550589 -0.702019 NaN    Pink
11  0.951908 -0.529933  0.344544 Tobi   Yellow
12       NaN  0.075340 -0.187669 Jon    Red
13       NaN  0.314342 -0.936066 NaN    Yellow
14       NaN  1.293355  0.098964 Peter  Orange

Any idears?

I have tried something like this:

import numpy as np
import pandas as pd

num_nan= df[col_name].isna().sum()
for n in len(range(num_nan)):
  #pick random value from e.g. col1 that's not NaN
  df[col_name] = df[col_name].where((pd.notnull(df)), None).sample(random_state= 1)     
  #replace NaN-value in e.g. col1 with picked value
  df[col_name]= df.fillna('value')`

to replace the NaN-value sin a columne with a random pick from the same column

CodePudding user response:

You can try:

for c in df:
    mask = df[c].isna()
    df.loc[mask, c] = np.random.choice(df.loc[~mask, c], size=(mask.sum(), 1))

print(df)

Prints (for example):

        Col1      Col2      Col3    Col4    Col5
0  -0.671603 -0.792415  0.783922     Jon    Blue
1   0.207720 -1.900148  0.996131     Tom  Yellow
2  -0.892115 -1.282333 -0.702019   Julia     Red
3  -0.315598 -2.371529 -1.959646    Tobi    Pink
4  -0.892115  0.075340 -0.584636     Jon  Orange
5   0.314736 -0.692732 -0.303951     Jim    Pink
6   0.355121 -0.792415  0.344544     Tom     Red
7  -0.892115 -1.900148  1.230828  Sophia     Red
8  -1.795468  0.490953 -0.303951    Anne    Blue
9  -0.678491 -0.087815  0.344544     Jon  Yellow
10  0.755714  0.550589 -0.702019   Peter    Pink
11  0.951908 -0.529933  0.344544    Tobi  Yellow
12 -0.678491  0.075340 -0.187669     Jon     Red
13  0.951908  0.314342 -0.936066   Julia  Yellow
14 -0.892115  1.293355  0.098964   Peter  Orange
  • Related