I have a data frame with 2 features which I have created using python code:
data_df = {"Age" : [10, 20, 30, 40, 50, np.NaN, np.NaN, np.NaN, np.NaN],
"Name" : ["A", "B", "C", "D", "E", "F", "G", "H", "I"]}
data_df = pd.DataFrame(data_df)
data_df.head(7)
Age | Name | |
---|---|---|
0 | 10.0 | A |
1 | 20.0 | B |
2 | 30.0 | C |
3 | 40.0 | D |
4 | 50.0 | E |
5 | NaN | F |
6 | NaN | G |
Now I want to replace all the Name value to NA where age is also NA so I use for loop as shown below:
am_decision = []
for (x,y) in zip(data_df['Age'],data_df['Name']):
if x == np.NaN:
am_decision.append(np.NaN)
else:
am_decision.append(y)
print(len(am_decision))
print(am_decision)
OUTPUT == 9
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']
As you can see the above for loop code is not working. Is there something that I missed?
CodePudding user response:
For test missing values use pandas.isna
:
am_decision = []
for (x,y) in zip(data_df['Age'],data_df['Name']):
if pd.isna(x):
am_decision.append(np.NaN)
else:
am_decision.append(y)
print(len(am_decision))
print(am_decision)
['A', 'B', 'C', 'D', 'E', nan, nan, nan, nan]
Non loop solution is faster and simplier - use Series.mask
with Series.isna
:
out = data_df['Name'].mask(data_df['Age'].isna())
print (out)
0 A
1 B
2 C
3 D
4 E
5 NaN
6 NaN
7 NaN
8 NaN
Name: Name, dtype: object
out = data_df['Name'].mask(data_df['Age'].isna()).tolist()
print (out)
['A', 'B', 'C', 'D', 'E', nan, nan, nan, nan]