I have the input dataframe(df1) with Ids, subids and features, having the nans in the features columns,
df1 = pd.DataFrame({'Id': ['A1', 'A2', 'A3', 'B1', 'B2'],
'Subid':['A', 'A', 'A', 'B', 'B'],
'feature1':[2.6, 6.3, np.nan, np.nan, 3.3],
'feature2':[55, np.nan, np.nan, 44, 69],
'feature3':[np.nan, 0.5, 0.3, np.nan, np.nan],
'feature4':[22, np.nan, 46, np.nan, 33],
'feature5':[np.nan, np.nan, 52, np.nan, 53]
})
I have another input dataframe(df2) having subids and the features values to be filled in.
df2 = pd.DataFrame({'Subid': ['A', 'B'],
'feature1': [2.966666666666667, 1.65],
'feature2': [18.333333333333332, 56.5],
'feature3': [0.26666666666666666, 0.0],
'feature4': [22.666666666666668, 16.5],
'feature5': [17.333333333333332, 26.5]})
I need to fill the nans in the df1 with the values present for each features in df2. I have tried lambda and apply function but unable to achieve the result
df1.loc[df1['feature1'].isna(), 'feature1'] = df2.groupby('Subid')['feature1'].apply(lambda x:x)
expected output:
outputdf = pd.DataFrame({'Id': ['A1', 'A2', 'A3', 'B1', 'B2'],
'Subid':['A', 'A', 'A', 'B', 'B'],
'feature1': [2.6, 6.3, 2.966667, 1.650000, 3.3],
'feature2': [55, 18.333333, 18.333333, 44, 69],
'feature3': [0.266667, 0.5, 0.3, 0.000000, 0.000000],
'feature4': [22, 22.666667, 46, 16.500000, 33],
'feature5': [17.333333, 17.333333, 52, 26.500000, 53]
})
Quick help is appreciated.
CodePudding user response:
You can use a merge
before fillna
:
out = df1.fillna(df1[['Subid']].merge(df2, how='left'))
Output:
Id Subid feature1 feature2 feature3 feature4 feature5
0 A1 A 2.600000 55.000000 0.266667 22.000000 17.333333
1 A2 A 6.300000 18.333333 0.500000 22.666667 17.333333
2 A3 A 2.966667 18.333333 0.300000 46.000000 52.000000
3 B1 B 1.650000 44.000000 0.000000 16.500000 26.500000
4 B2 B 3.300000 69.000000 0.000000 33.000000 53.000000
CodePudding user response:
for f in [f for f in df1.columns if f.startswith('feature')]:
df1[f]=df1[f].mask(pd.isnull, df1[['Subid']].merge(df2[['Subid', f]])[f])
returns:
Id Subid feature1 feature2 feature3 feature4 feature5
0 A1 A 2.600000 55.000000 0.266667 22.000000 17.333333
1 A2 A 6.300000 18.333333 0.500000 22.666667 17.333333
2 A3 A 2.966667 18.333333 0.300000 46.000000 52.000000
3 B1 B 1.650000 44.000000 0.000000 16.500000 26.500000
4 B2 B 3.300000 69.000000 0.000000 33.000000 53.000000
CodePudding user response:
You can use fillna to fill up the np.nans values and merge the second dataframe with matching Subid's
result = df1.fillna(df1[['Subid']].merge(df2, on='Subid', how='left'))