Home > Software design >  Fill NaN value in columns with other columns with new columns creation
Fill NaN value in columns with other columns with new columns creation

Time:02-24

I want to fill the NaN value in selected columns with the values from other selected columns and create a range of new columns after the filling. I can do this one by one by using fillna() with the other column names and create the column but I wonder how I can do this at scale, or at least some ways to reduce the repeating.

import pandas as pd
import numpy as np

inp = [
    {'a1hk':10, 'a2hk':100, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan}, 
    {'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan}, 
    {'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':20, 'c1mk':50, 'c2pk':60, 'c2wk':80}, 
    {'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':70}, 
    {'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':80}]
df = pd.DataFrame(inp)
df

   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk  c1mk  c2pk  c2wk
0  10.0  100.0  50.0  80.0  90.0   NaN   NaN   NaN   NaN
1  11.0   50.0  50.0  80.0  90.0   NaN   NaN   NaN   NaN
2  11.0   50.0  50.0  80.0  90.0  20.0  50.0  60.0  80.0
3   NaN    NaN   NaN   NaN   NaN  50.0   NaN  60.0  70.0
4   NaN    NaN   NaN   NaN   NaN  50.0   NaN  60.0  80.0


The columns needs to get filled: a1hk, a2hk, b1uk, b2uk, d1dk
The columns where we get the value: c1jk, c1mk, c2pk, c2wk

df['a1hk_jk'] = df['a1hk'].fillna(df['c1jk'])
df['a2hk_mk'] = df['a1hk'].fillna(df['c1mk'])
df['a1hk_pk'] = df['a1hk'].fillna(df['c2pk'])
df['a1hk_wk'] = df['a1hk'].fillna(df['c2wk'])
df['a2hk_jk'] = df['a2hk'].fillna(df['c1jk'])
....
....
....
df['d1dk_jk'] = df['d1dk'].fillna(df['c1jk'])
df['d1dk_mk'] = df['d1dk'].fillna(df['c1mk'])
df['d1dk_pk'] = df['d1dk'].fillna(df['c2pk'])
df['d1dk_wk'] = df['d1dk'].fillna(df['c2wk'])

# Expected output (minimal)
   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk   c1mk  c2pk  c2wk  a1hk_jk  a2hk_mk  a1hk_pk  a1hk_wk  d1dk_jk  d1dk_mk  d1dk_pk  d1dk_wk
0  10.0  100.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     10.0     10.0     10.0     10.0     90.0     90.0     90.0     90.0 
1  11.0   50.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     11.0     11.0     11.0     11.0     90.0     90.0     90.0     90.0 
2  11.0   50.0  50.0  80.0  90.0  20.0   50.0  60.0  80.0     11.0     11.0     11.0     11.0     90.0     90.0     90.0     90.0 
3   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  70.0     50.0    100.0     60.0     70.0     50.0    100.0     60.0     70.0 
4   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  80.0     50.0    100.0     60.0     80.0     50.0    100.0     60.0     80.0

CodePudding user response:

You can create list from source and destination columns and call function in for loop, for new columns names is used f-strings with last 2 letters of destination values:

source = 'a1hk, a2hk, b1uk, b2uk, d1dk'.split(', ')
dest = 'c1jk, c1mk, c2pk, c2wk'.split(', ')

for x, y in zip(source, dest):
    df[f'{x}_{y[-2:]}'] = df[x].fillna(df[y])
print (df)
   a1hk   a2hk  b1uk  b2uk  d1dk  c1jk   c1mk  c2pk  c2wk  a1hk_jk  a2hk_mk  \
0  10.0  100.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     10.0    100.0   
1  11.0   50.0  50.0  80.0  90.0   NaN    NaN   NaN   NaN     11.0     50.0   
2  11.0   50.0  50.0  80.0  90.0  20.0   50.0  60.0  80.0     11.0     50.0   
3   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  70.0     50.0    100.0   
4   NaN    NaN   NaN   NaN   NaN  50.0  100.0  60.0  80.0     50.0    100.0   

   b1uk_pk  b2uk_wk  
0     50.0     80.0  
1     50.0     80.0  
2     50.0     80.0  
3     60.0     70.0  
4     60.0     80.0  
  • Related