Fill NaN value in columns with other columns with new columns creation-CodePudding

I want to fill the NaN value in selected columns with the values from other selected columns and create a range of new columns after the filling. I can do this one by one by using fillna() with the other column names and create the column but I wonder how I can do this at scale, or at least some ways to reduce the repeating.

import pandas as pd
import numpy as np

inp = [
    {'a1hk':10, 'a2hk':100, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan}, 
    {'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan}, 
    {'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':20, 'c1mk':50, 'c2pk':60, 'c2wk':80}, 
    {'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':70}, 
    {'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':80}]
df = pd.DataFrame(inp)
df

   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk  c1mk  c2pk  c2wk
0  10.0  100.0  50.0  80.0  90.0   NaN   NaN   NaN   NaN
1  11.0   50.0  50.0  80.0  90.0   NaN   NaN   NaN   NaN
2  11.0   50.0  50.0  80.0  90.0  20.0  50.0  60.0  80.0
3   NaN    NaN   NaN   NaN   NaN  50.0   NaN  60.0  70.0
4   NaN    NaN   NaN   NaN   NaN  50.0   NaN  60.0  80.0


The columns needs to get filled: a1hk, a2hk, b1uk, b2uk, d1dk
The columns where we get the value: c1jk, c1mk, c2pk, c2wk

df['a1hk_jk'] = df['a1hk'].fillna(df['c1jk'])
df['a2hk_mk'] = df['a1hk'].fillna(df['c1mk'])
df['a1hk_pk'] = df['a1hk'].fillna(df['c2pk'])
df['a1hk_wk'] = df['a1hk'].fillna(df['c2wk'])
df['a2hk_jk'] = df['a2hk'].fillna(df['c1jk'])
....
....
....
df['d1dk_jk'] = df['d1dk'].fillna(df['c1jk'])
df['d1dk_mk'] = df['d1dk'].fillna(df['c1mk'])
df['d1dk_pk'] = df['d1dk'].fillna(df['c2pk'])
df['d1dk_wk'] = df['d1dk'].fillna(df['c2wk'])

# Expected output (minimal)
   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk   c1mk  c2pk  c2wk  a1hk_jk  a2hk_mk  a1hk_pk  a1hk_wk  d1dk_jk  d1dk_mk  d1dk_pk  d1dk_wk
0  10.0  100.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     10.0     10.0     10.0     10.0     90.0     90.0     90.0     90.0 
1  11.0   50.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     11.0     11.0     11.0     11.0     90.0     90.0     90.0     90.0 
2  11.0   50.0  50.0  80.0  90.0  20.0   50.0  60.0  80.0     11.0     11.0     11.0     11.0     90.0     90.0     90.0     90.0 
3   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  70.0     50.0    100.0     60.0     70.0     50.0    100.0     60.0     70.0 
4   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  80.0     50.0    100.0     60.0     80.0     50.0    100.0     60.0     80.0

CodePudding user response：

You can create list from source and destination columns and call function in for loop, for new columns names is used f-strings with last 2 letters of destination values:

source = 'a1hk, a2hk, b1uk, b2uk, d1dk'.split(', ')
dest = 'c1jk, c1mk, c2pk, c2wk'.split(', ')

for x, y in zip(source, dest):
    df[f'{x}_{y[-2:]}'] = df[x].fillna(df[y])
print (df)
   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk   c1mk  c2pk  c2wk  a1hk_jk  a2hk_mk  \
0  10.0  100.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     10.0    100.0   
1  11.0   50.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     11.0     50.0   
2  11.0   50.0  50.0  80.0  90.0  20.0   50.0  60.0  80.0     11.0     50.0   
3   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  70.0     50.0    100.0   
4   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  80.0     50.0    100.0   

   b1uk_pk  b2uk_wk  
0     50.0     80.0  
1     50.0     80.0  
2     50.0     80.0  
3     60.0     70.0  
4     60.0     80.0