Fill null value with random value-CodePudding

I'm trying to fill null values in my continuous variables column with random numbers. I tried the code below but can't seem to get the null values to be filled with a random number. Any thoughts?

df.mask(np.random.choice([True, False], size=df.shape, p=[.2,.8]))

CodePudding user response：

A simple solution (but not optimal) is to create a dataframe with the same shape and index/columns labels then use fillna:

df = df.fillna(pd.DataFrame(np.random.random(df.shape),
                            index=df.index, columns=df.columns))

CodePudding user response：

Did you ever tried pandas.DataFrame.fillna ?

Following the doc's page, let's create a df with nulls:

df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, 5],
                   [np.nan, 3, np.nan, 4]],
                  columns=list("ABCD"))

df output:

     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

Then, I can fill the missing values with::

np.random.seed(42)

df.fillna(np.random.random())

Note: If you want to have reproducible code, it is good to seed the random number.

the new df with missings filled up

        A         B         C  D
0  0.950714  2.000000  0.950714  0
1  3.000000  4.000000  0.950714  1
2  0.950714  0.950714  0.950714  5
3  0.950714  3.000000  0.950714  4