I have a dataframe with some NaN values like the one below and I would like to fill in the nan values in a column with random picks from the same column. e.g. randomly pick values from Col1 to fill in the NaN-values in Col1
Col1 Col2 Col3 Col4 Col5
0 -0.671603 -0.792415 0.783922 NaN Blue
1 0.207720 NaN 0.996131 Tom Yellow
2 -0.892115 -1.282333 NaN Julia NaN
3 -0.315598 -2.371529 -1.959646 NaN Pink
4 NaN NaN -0.584636 NaN Orange
5 0.314736 -0.692732 -0.303951 Jim NaN
6 0.355121 NaN NaN NaN Red
7 NaN -1.900148 1.230828 Sophia NaN
8 -1.795468 0.490953 NaN Anne Blue
9 -0.678491 -0.087815 NaN NaN NaN
10 0.755714 0.550589 -0.702019 NaN Pink
11 0.951908 -0.529933 0.344544 Tobi Yellow
12 NaN 0.075340 -0.187669 Jon Red
13 NaN 0.314342 -0.936066 NaN Yellow
14 NaN 1.293355 0.098964 Peter Orange
Any idears?
I have tried something like this:
import numpy as np
import pandas as pd
num_nan= df[col_name].isna().sum()
for n in len(range(num_nan)):
#pick random value from e.g. col1 that's not NaN
df[col_name] = df[col_name].where((pd.notnull(df)), None).sample(random_state= 1)
#replace NaN-value in e.g. col1 with picked value
df[col_name]= df.fillna('value')`
to replace the NaN-value sin a columne with a random pick from the same column
CodePudding user response:
You can try:
for c in df:
mask = df[c].isna()
df.loc[mask, c] = np.random.choice(df.loc[~mask, c], size=(mask.sum(), 1))
print(df)
Prints (for example):
Col1 Col2 Col3 Col4 Col5
0 -0.671603 -0.792415 0.783922 Jon Blue
1 0.207720 -1.900148 0.996131 Tom Yellow
2 -0.892115 -1.282333 -0.702019 Julia Red
3 -0.315598 -2.371529 -1.959646 Tobi Pink
4 -0.892115 0.075340 -0.584636 Jon Orange
5 0.314736 -0.692732 -0.303951 Jim Pink
6 0.355121 -0.792415 0.344544 Tom Red
7 -0.892115 -1.900148 1.230828 Sophia Red
8 -1.795468 0.490953 -0.303951 Anne Blue
9 -0.678491 -0.087815 0.344544 Jon Yellow
10 0.755714 0.550589 -0.702019 Peter Pink
11 0.951908 -0.529933 0.344544 Tobi Yellow
12 -0.678491 0.075340 -0.187669 Jon Red
13 0.951908 0.314342 -0.936066 Julia Yellow
14 -0.892115 1.293355 0.098964 Peter Orange