I have dataframe where column 'Score' is calculated from values in other columns. I would need to have missing value in Score column if any of other columns has missing value for that row.
df = pd.DataFrame({'Score': [71, 63, 23],
'Factor_1': [nan, '15', '23'],
'Factor_2': ['12', nan, '45'],
'Factor_3': ['3', '5', '7']})
Expected values for column Score: nan, nan, 23 (because Factor 1 is missing in 1st row and Factor 2 is missing in 2nd row). So, I should replace existing values with NAs.
Thank you for your help.
CodePudding user response:
Use DataFrame.filter
for Factor
column, test if missing values by DataFrame.isna
for at least one value per row by DataFrame.any
and set NaN
by DataFrame.loc
:
df.loc[df.filter(like='Factor').isna().any(axis=1), 'Score'] = np.nan
Or use Series.mask
:
df['Score'] = df['Score'].mask(df.filter(like='Factor').isna().any(axis=1))
If need explicit columns names:
cols = ['Factor_1', 'Factor_2', 'Factor_3']
df.loc[df[cols].isna().any(axis=1), 'Score'] = np.nan
df['Score'] = df['Score'].mask(df[cols].isna().any(axis=1))
print (df)
Score Factor_1 Factor_2 Factor_3
0 NaN NaN 12 3
1 NaN 15 NaN 5
2 23.0 23 45 7