Home > Software engineering >  How to replace missing value with NA using for loop in Python
How to replace missing value with NA using for loop in Python

Time:04-08

I have a data frame with 2 features which I have created using python code:

data_df = {"Age" : [10, 20, 30, 40, 50, np.NaN, np.NaN, np.NaN, np.NaN],
           "Name" : ["A", "B", "C", "D", "E", "F", "G", "H", "I"]}
data_df = pd.DataFrame(data_df)
data_df.head(7)
Age Name
0 10.0 A
1 20.0 B
2 30.0 C
3 40.0 D
4 50.0 E
5 NaN F
6 NaN G

Now I want to replace all the Name value to NA where age is also NA so I use for loop as shown below:

am_decision = []

for (x,y) in zip(data_df['Age'],data_df['Name']):
    if x == np.NaN:
        am_decision.append(np.NaN)
    else:
        am_decision.append(y)
print(len(am_decision))
print(am_decision) 

OUTPUT == 9
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

As you can see the above for loop code is not working. Is there something that I missed?

CodePudding user response:

For test missing values use pandas.isna:

am_decision = []

for (x,y) in zip(data_df['Age'],data_df['Name']):
    if pd.isna(x):
        am_decision.append(np.NaN)
    else:
        am_decision.append(y)
print(len(am_decision))
print(am_decision) 
['A', 'B', 'C', 'D', 'E', nan, nan, nan, nan]

Non loop solution is faster and simplier - use Series.mask with Series.isna:

out = data_df['Name'].mask(data_df['Age'].isna())
print (out)
0      A
1      B
2      C
3      D
4      E
5    NaN
6    NaN
7    NaN
8    NaN
Name: Name, dtype: object

out = data_df['Name'].mask(data_df['Age'].isna()).tolist()
print (out)
['A', 'B', 'C', 'D', 'E', nan, nan, nan, nan]
  • Related