replace null value with random value-CodePudding

i have data where there are missing values in salary1 column and i am trying to replace it with a range between the minimum value and maximum value. i tried with below code but got error. could someone please help me? thank in advance.

code

a = df['salary1'].max()
b = df['salary1'].min()
df['salary1'] = df['salary1'].replace(np.nan, range(b,a))

error

TypeError: 'float' object cannot be interpreted as an integer

data

salary1 Experience
0   NaN 2
1   100000.0    3
2   NaN 4
3   NaN 4
4   NaN 1
... ... ...
12884   NaN 1
12885   NaN 3
12886   150000.0    2
12887   NaN 2
12888   NaN 4

CodePudding user response：

range returns a sequence/iterator, use np.random.rand

CodePudding user response：

range is the wrong function to use. list(range(a,b)) returns a sequence of numbers from a to (but not including) b. For example, list(range(5,10)) returns [5,6,7,8,9]

Try with numpy.random.randint:

import numpy as np
df['salary1'] = df['salary1'].apply(lambda x: x if pd.notnull(x) else np.random.randint(df['salary1'].min(), df['salary1'].max()))

Example:

>>> df
    salary1
0  100000.0
1       NaN
2  150000.0
3       NaN
4  175000.0
5       NaN

>>> df['salary1'].apply(lambda x: x if pd.notnull(x) else np.random.randint(df['salary1'].min(), df['salary1'].max()))
0    100000.0
1    119091.0
2    150000.0
3    171438.0
4    175000.0
5    114396.0
Name: salary1, dtype: float64

CodePudding user response：

In the below code you are generating random int between range [a,b]-

df['salary1'] = df['salary1'].replace(np.nan, random.randint(a, b))