I'm looking how to add a column to df that holds the column name of the first column that is not NaN or Null. Example: column "first" based on columns 'A', 'B', 'C', 'D'
Name A B C D First
Alex 100 Nan Null NaN A
Michael 300 400 Null Null A
Steve Null 100 100 100 B
John Null Null Nan Null Null
CodePudding user response:
Convert Null, Nan
values to misisng values and if necessary Name
to index, then test DataFrame.notna
and get first matched column by DataFrame.idxmax
, last set NaN
if all NaN
s per row:
df = df.replace(['Null', 'Nan'], np.nan)
df = df.set_index('Name')
m = df.notna()
df['First'] = m.idxmax(axis=1).where(m.any(axis=1))
print (df)
A B C D First
Name
Alex 100 Nan NaN NaN A
Michael 300 400 NaN NaN A
Steve NaN 100 100 100 B
John NaN NaN NaN NaN NaN
If need original values without replace:
df = df.set_index('Name')
m = df.isin(['Null', 'Nan']) | df.isna()
df['First'] = (~m).idxmax(axis=1).mask(m.all(axis=1), 'Null')