I have a question about how to create a new column based on several column value.
Input:
col1 col2 col3
1 1 1
NULL NULL NULL
2 NULL 2
NULL NULL 3
4 NULL NULL
5 5 NULL
Output
col1 col2 col3 new
1 1 1 1
NULL NULL NULL NULL
2 NULL 2 2
NULL NULL 3 3
4 NULL NULL 4
5 5 NULL 5
I am trying to use combine_first, but it seems not be a good choice since I have multiple columns need to combine.
CodePudding user response:
One option is to rename
the columns to have the same name; then use groupby
first
:
df['new'] = (df.rename(columns={col: 'col' for col in df.columns})
.groupby(level=0, axis=1).first())
You could also iteratively use combine_first
:
df['new'] = float('nan')
for c in df.columns:
df['new'] = df['new'].combine_first(df[c])
Or you could apply a lambda that selects non-NaN values row-wise (works for Python>=3.8 since it uses the walrus operator; could write the same function differently if you have Python<3.8):
df['new'] = df.apply(lambda x: res[0] if (res:=x[x.notna()].tolist()) else float('nan'), axis=1)
Output:
col1 col2 col3 new
0 1.0 1.0 1.0 1.0
1 NaN NaN NaN NaN
2 2.0 NaN 2.0 2.0
3 NaN NaN 3.0 3.0
4 4.0 NaN NaN 4.0
5 5.0 5.0 NaN 5.0