Combine 3 dataframe columns into 1 with priority while avoiding apply-CodePudding

Let's say I have 3 different columns

   Column1  Column2  Column3
 0    a        1       NaN
 1   NaN       3        4
 2    b        6        7
 3   NaN      NaN       7

and I want to create 1 final column that would take first value that isn't NA, resulting in:

I would usually do this with custom apply function:

df.apply(lambda x: ...)

I need to do this for many different cases with millions of rows and this becomes very slow. Are there any operations that would take advantage of vectorization to make this faster?

CodePudding user response：

Back filling missing values and select first column by [] for one column DataFrame or without for Series:

df1 = df.bfill(axis=1).iloc[:, [0]]

s = df.bfill(axis=1).iloc[:, 0]

CodePudding user response：

You can use pd.fillna() for this, as below:

df['Column1'].fillna(df['Column2']).fillna(df['Column3'])

output:

For more than 3 columns, this can be placed in a for loop as below, with new_col being your output:

new_col = df['Column1']
for col in df.columns:
    new_col = new_col.fillna(df[col])