Fill column hierarchically or recursively-CodePudding

I want to create a new column [new_var] based on variable_1. If variable_1 is NA, then use variable_2. If both of them are NA, then leave them as NA.

Is there an smarter way to do it than below? The solution wouldn't scale up well if I had 4 or 5 variables.

df['new_var'] = df['variable_1']

df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'new_var'] = df.loc[(df['new_var'].isna()) & (df['variable_2'].notna()), 'variable_2']

CodePudding user response：

Use bfill:

The solution wouldn't scale up well if I had 4 or 5 variables.

cols = ['var1', 'var2', 'var3']
df['new_var'] = df[cols].bfill(axis=1)[cols[0]]
print(df)

# Output:
   var1  var2  var3  new_var
0   3.0   4.0   9.0      3.0
1   NaN   8.0   5.0      8.0
2   NaN   NaN   6.0      6.0
3   NaN   NaN   NaN      NaN

Setup:

df = pd.DataFrame({'var1': [3, np.NaN, np.NaN, np.NaN],
                   'var2': [4, 8, np.NaN, np.NaN],
                   'var3': [9, 5, 6, np.NaN]})

Old answers: only work for 2 variables

Use fillna:

df['new_var'] = df['var1'].fillna(df['var2'])
print(df)

# Output:
   var1  var2  new_var
0   3.0   4.0      3.0
1   NaN   8.0      8.0
2   NaN   NaN      NaN

Setup:

df = pd.DataFrame({'var1': [3, np.NaN, np.NaN], 'var2': [4, 8, np.NaN]})

Update

You can also use combine_first:

df['new_var'] = df['var1'].combine_first(df['var2'])
print(df)

# Output:
   var1  var2  new_var
0   3.0   4.0      3.0
1   NaN   8.0      8.0
2   NaN   NaN      NaN

CodePudding user response：

Hard to answer without example data, but I guess you should simply pandas.where:

df['new_var'] = df['variable_2'].where(df['variable_1'].isna())

CodePudding user response：

Use np.where:

import numpy as np
df['new_var'] = np.where(df['variable_1'].isna(), df['variable_2'], df['variable_1'])