I want to create a new column [new_var
] based on variable_1. If variable_1 is NA, then use variable_2. If both of them are NA, then leave them as NA.
Is there an smarter way to do it than below? The solution wouldn't scale up well if I had 4 or 5 variables.
df['new_var'] = df['variable_1']
df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'new_var'] = df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'variable_2']
CodePudding user response:
Use bfill
:
The solution wouldn't scale up well if I had 4 or 5 variables.
cols = ['var1', 'var2', 'var3']
df['new_var'] = df[cols].bfill(axis=1)[cols[0]]
print(df)
# Output:
var1 var2 var3 new_var
0 3.0 4.0 9.0 3.0
1 NaN 8.0 5.0 8.0
2 NaN NaN 6.0 6.0
3 NaN NaN NaN NaN
Setup:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN, np.NaN],
'var2': [4, 8, np.NaN, np.NaN],
'var3': [9, 5, 6, np.NaN]})
Old answers: only work for 2 variables
Use fillna
:
df['new_var'] = df['var1'].fillna(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN
Setup:
df = pd.DataFrame({'var1': [3, np.NaN, np.NaN], 'var2': [4, 8, np.NaN]})
Update
You can also use combine_first
:
df['new_var'] = df['var1'].combine_first(df['var2'])
print(df)
# Output:
var1 var2 new_var
0 3.0 4.0 3.0
1 NaN 8.0 8.0
2 NaN NaN NaN
CodePudding user response:
Hard to answer without example data, but I guess you should simply pandas.where
:
df['new_var'] = df['variable_2'].where(df['variable_1'].isna())
CodePudding user response:
Use np.where
:
import numpy as np
df['new_var'] = np.where(df['variable_1'].isna(), df['variable_2'], df['variable_1'])