Home > other >  Imputing nulls in a row with other row if one column is same
Imputing nulls in a row with other row if one column is same

Time:11-23

I have a dataframe

data = [[1000, 'x', 'A'], [2000,'y', 'A'], ['NaN','NaN', 'A'], ['NaN','NaN','B'], [1700,'z', 'B']]

df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df = df.replace('NaN',np.nan)

Now i want to impute the nulls in such a way that if Model is same, copy the content of rows having least Price to the row having nulls.

The output should look like

data = [[1000, 'x', 'A'], [2000,'y', 'A'], [1000, 'x', 'A'], [1700,'z','B'], [1700,'z', 'B']]
  
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])

I have tried groupby and followed Merge two duplicate rows with imputing values from each other

but it did not work. can someone help

CodePudding user response:

If there is multiple columns use DataFrame.fillna with minimal values per groups to new columns by GroupBy.transform:

cols = ['Price','Col1']
df[cols] = df[cols].fillna(df.groupby('Model')[cols].transform('min'))
print(df)
    Price Attribute Model
0  1000.0         x     A
1  2000.0         y     A
2  1000.0       NaN     A
3  1700.0       NaN     B
4  1700.0         z     B

EDIT: If need replace all missing values by rows with no NaNs use:

data = [[1000, 'x', 'A'], [2000,'y', 'A'], [np.nan,np.nan, 'A'], 
        [np.nan,np.nan,'B'], [1700,'z', 'B']]

df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])

df1 = df.loc[df.dropna().groupby('Model')['Price'].idxmin()]
print (df1)
    Price Attribute Model
0  1000.0         x     A
4  1700.0         z     B

df = df.set_index('Model').fillna(df1.set_index('Model')).reset_index()
print (df)
  Model   Price Attribute
0     A  1000.0         x
1     A  2000.0         y
2     A  1000.0         x
3     B  1700.0         z
4     B  1700.0         z
  • Related