I want to fill the NaN
value in selected columns with the values from other selected columns and create a range of new columns after the filling. I can do this one by one by using fillna()
with the other column names and create the column but I wonder how I can do this at scale, or at least some ways to reduce the repeating.
import pandas as pd
import numpy as np
inp = [
{'a1hk':10, 'a2hk':100, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan},
{'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':np.nan, 'c1mk':np.nan, 'c2pk':np.nan, 'c2wk':np.nan},
{'a1hk':11,'a2hk':50, 'b1uk':50, 'b2uk':80, 'd1dk':90, 'c1jk':20, 'c1mk':50, 'c2pk':60, 'c2wk':80},
{'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':70},
{'a1hk':np.nan,'a2hk':np.nan, 'b1uk':np.nan, 'b2uk':np.nan, 'd1dk':np.nan, 'c1jk':50, 'c1mk':100, 'c2pk':60, 'c2wk':80}]
df = pd.DataFrame(inp)
df
a1hk a2hk b1uk b2uk d1dk c1jk c1mk c2pk c2wk
0 10.0 100.0 50.0 80.0 90.0 NaN NaN NaN NaN
1 11.0 50.0 50.0 80.0 90.0 NaN NaN NaN NaN
2 11.0 50.0 50.0 80.0 90.0 20.0 50.0 60.0 80.0
3 NaN NaN NaN NaN NaN 50.0 NaN 60.0 70.0
4 NaN NaN NaN NaN NaN 50.0 NaN 60.0 80.0
The columns needs to get filled: a1hk, a2hk, b1uk, b2uk, d1dk
The columns where we get the value: c1jk, c1mk, c2pk, c2wk
df['a1hk_jk'] = df['a1hk'].fillna(df['c1jk'])
df['a2hk_mk'] = df['a1hk'].fillna(df['c1mk'])
df['a1hk_pk'] = df['a1hk'].fillna(df['c2pk'])
df['a1hk_wk'] = df['a1hk'].fillna(df['c2wk'])
df['a2hk_jk'] = df['a2hk'].fillna(df['c1jk'])
....
....
....
df['d1dk_jk'] = df['d1dk'].fillna(df['c1jk'])
df['d1dk_mk'] = df['d1dk'].fillna(df['c1mk'])
df['d1dk_pk'] = df['d1dk'].fillna(df['c2pk'])
df['d1dk_wk'] = df['d1dk'].fillna(df['c2wk'])
# Expected output (minimal)
a1hk a2hk b1uk b2uk d1dk c1jk c1mk c2pk c2wk a1hk_jk a2hk_mk a1hk_pk a1hk_wk d1dk_jk d1dk_mk d1dk_pk d1dk_wk
0 10.0 100.0 50.0 80.0 90.0 NaN NaN NaN NaN 10.0 10.0 10.0 10.0 90.0 90.0 90.0 90.0
1 11.0 50.0 50.0 80.0 90.0 NaN NaN NaN NaN 11.0 11.0 11.0 11.0 90.0 90.0 90.0 90.0
2 11.0 50.0 50.0 80.0 90.0 20.0 50.0 60.0 80.0 11.0 11.0 11.0 11.0 90.0 90.0 90.0 90.0
3 NaN NaN NaN NaN NaN 50.0 100.0 60.0 70.0 50.0 100.0 60.0 70.0 50.0 100.0 60.0 70.0
4 NaN NaN NaN NaN NaN 50.0 100.0 60.0 80.0 50.0 100.0 60.0 80.0 50.0 100.0 60.0 80.0
CodePudding user response:
You can create list from source and destination columns and call function in for loop
, for new columns names is used f-string
s with last 2 letters of destination values:
source = 'a1hk, a2hk, b1uk, b2uk, d1dk'.split(', ')
dest = 'c1jk, c1mk, c2pk, c2wk'.split(', ')
for x, y in zip(source, dest):
df[f'{x}_{y[-2:]}'] = df[x].fillna(df[y])
print (df)
a1hk a2hk b1uk b2uk d1dk c1jk c1mk c2pk c2wk a1hk_jk a2hk_mk \
0 10.0 100.0 50.0 80.0 90.0 NaN NaN NaN NaN 10.0 100.0
1 11.0 50.0 50.0 80.0 90.0 NaN NaN NaN NaN 11.0 50.0
2 11.0 50.0 50.0 80.0 90.0 20.0 50.0 60.0 80.0 11.0 50.0
3 NaN NaN NaN NaN NaN 50.0 100.0 60.0 70.0 50.0 100.0
4 NaN NaN NaN NaN NaN 50.0 100.0 60.0 80.0 50.0 100.0
b1uk_pk b2uk_wk
0 50.0 80.0
1 50.0 80.0
2 50.0 80.0
3 60.0 70.0
4 60.0 80.0