I'm trying to fill null values in my continuous variables column with random numbers. I tried the code below but can't seem to get the null values to be filled with a random number. Any thoughts?
df.mask(np.random.choice([True, False], size=df.shape, p=[.2,.8]))
CodePudding user response:
A simple solution (but not optimal) is to create a dataframe with the same shape and index/columns labels then use fillna
:
df = df.fillna(pd.DataFrame(np.random.random(df.shape),
index=df.index, columns=df.columns))
CodePudding user response:
Did you ever tried pandas.DataFrame.fillna ?
Following the doc's page, let's create a df with nulls:
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list("ABCD"))
df output:
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 NaN NaN NaN 5
3 NaN 3.0 NaN 4
Then, I can fill the missing values with::
np.random.seed(42)
df.fillna(np.random.random())
Note: If you want to have reproducible code, it is good to seed the random number.
the new df with missings filled up
A B C D
0 0.950714 2.000000 0.950714 0
1 3.000000 4.000000 0.950714 1
2 0.950714 0.950714 0.950714 5
3 0.950714 3.000000 0.950714 4