Home > front end >  create new column among multiple columns value
create new column among multiple columns value

Time:03-08

I have a question about how to create a new column based on several column value.

Input:

col1 col2 col3 
1 1 1 
NULL NULL NULL 
2 NULL 2 
NULL NULL 3 
4 NULL NULL 
5 5 NULL 

Output

col1 col2 col3 new
1 1 1 1
NULL NULL NULL NULL
2 NULL 2 2
NULL NULL 3 3
4 NULL NULL 4
5 5 NULL 5

I am trying to use combine_first, but it seems not be a good choice since I have multiple columns need to combine.

CodePudding user response:

One option is to rename the columns to have the same name; then use groupby first:

df['new'] = (df.rename(columns={col: 'col' for col in df.columns})
             .groupby(level=0, axis=1).first())

You could also iteratively use combine_first:

df['new'] = float('nan')
for c in df.columns:
    df['new'] = df['new'].combine_first(df[c])

Or you could apply a lambda that selects non-NaN values row-wise (works for Python>=3.8 since it uses the walrus operator; could write the same function differently if you have Python<3.8):

df['new'] = df.apply(lambda x: res[0] if (res:=x[x.notna()].tolist()) else float('nan'), axis=1)

Output:

   col1  col2  col3  new
0   1.0   1.0   1.0  1.0
1   NaN   NaN   NaN  NaN
2   2.0   NaN   2.0  2.0
3   NaN   NaN   3.0  3.0
4   4.0   NaN   NaN  4.0
5   5.0   5.0   NaN  5.0
  • Related