I have a df:
df = pd.DataFrame({'Col1': [np.NaN, 1, 2], 'Col2': [7, 9, np.NaN], 'Col3': [np.NaN, np.NaN, 5]})
How can I replace each NaN in df
with a random unique number which is not existing in df
, for example:
df = pd.DataFrame({'Col1': [8, 1, 2], 'Col2': [7, 9, 11], 'Col3': [30, 33, 5]})
Thank you very much.
CodePudding user response:
one way is to mask with a df the same size of random numbers:
import random
total_size = df.shape[0]*df.shape[1]
rands = [x for x in random.sample(range(total_size*10), total_size*2) if x not in df.values][:total_size]
rands_mat = np.array(rands).reshape((df.shape))
df.mask(pd.isnull(df), rands_mat)
Col1 | Col2 | Col3 | |
---|---|---|---|
0 | 4 | 7 | 23 |
1 | 1 | 9 | 19 |
2 | 2 | 71 | 5 |