Home > Enterprise >  How to fill the nans using groupby and filling values from another dataframe
How to fill the nans using groupby and filling values from another dataframe

Time:12-01

I have the input dataframe(df1) with Ids, subids and features, having the nans in the features columns,

df1 = pd.DataFrame({'Id': ['A1', 'A2', 'A3', 'B1', 'B2'], 
                'Subid':['A', 'A', 'A', 'B', 'B'], 
                'feature1':[2.6, 6.3, np.nan, np.nan, 3.3],
               'feature2':[55, np.nan, np.nan, 44, 69],
                'feature3':[np.nan, 0.5, 0.3, np.nan, np.nan],
                'feature4':[22, np.nan, 46, np.nan, 33],
                'feature5':[np.nan, np.nan, 52, np.nan, 53]
               })

I have another input dataframe(df2) having subids and the features values to be filled in.

df2 = pd.DataFrame({'Subid': ['A', 'B'],
                     'feature1': [2.966666666666667, 1.65],
                     'feature2': [18.333333333333332, 56.5],
                     'feature3': [0.26666666666666666, 0.0],
                     'feature4': [22.666666666666668, 16.5],
                     'feature5': [17.333333333333332, 26.5]})

I need to fill the nans in the df1 with the values present for each features in df2. I have tried lambda and apply function but unable to achieve the result

df1.loc[df1['feature1'].isna(), 'feature1'] = df2.groupby('Subid')['feature1'].apply(lambda x:x)

expected output:

outputdf = pd.DataFrame({'Id': ['A1', 'A2', 'A3', 'B1', 'B2'], 
                'Subid':['A', 'A', 'A', 'B', 'B'], 
                'feature1': [2.6, 6.3, 2.966667, 1.650000, 3.3],
                'feature2': [55, 18.333333, 18.333333, 44, 69],
                'feature3': [0.266667, 0.5, 0.3, 0.000000, 0.000000],
                'feature4': [22, 22.666667, 46, 16.500000, 33],
                'feature5': [17.333333, 17.333333, 52, 26.500000, 53]
               })

Quick help is appreciated.

CodePudding user response:

You can use a merge before fillna:

out = df1.fillna(df1[['Subid']].merge(df2, how='left'))

Output:

   Id Subid  feature1   feature2  feature3   feature4   feature5
0  A1     A  2.600000  55.000000  0.266667  22.000000  17.333333
1  A2     A  6.300000  18.333333  0.500000  22.666667  17.333333
2  A3     A  2.966667  18.333333  0.300000  46.000000  52.000000
3  B1     B  1.650000  44.000000  0.000000  16.500000  26.500000
4  B2     B  3.300000  69.000000  0.000000  33.000000  53.000000

CodePudding user response:

for f in [f for f in df1.columns if f.startswith('feature')]:
    df1[f]=df1[f].mask(pd.isnull, df1[['Subid']].merge(df2[['Subid', f]])[f])

returns:

   Id Subid  feature1   feature2  feature3   feature4   feature5
0  A1     A  2.600000  55.000000  0.266667  22.000000  17.333333
1  A2     A  6.300000  18.333333  0.500000  22.666667  17.333333
2  A3     A  2.966667  18.333333  0.300000  46.000000  52.000000
3  B1     B  1.650000  44.000000  0.000000  16.500000  26.500000
4  B2     B  3.300000  69.000000  0.000000  33.000000  53.000000

CodePudding user response:

You can use fillna to fill up the np.nans values and merge the second dataframe with matching Subid's

result = df1.fillna(df1[['Subid']].merge(df2, on='Subid', how='left'))
  • Related