I have a dataframe
data = [[1000, 'x', 'A'], [2000,'y', 'A'], ['NaN','NaN', 'A'], ['NaN','NaN','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df = df.replace('NaN',np.nan)
Now i want to impute the nulls in such a way that if Model is same, copy the content of rows having least Price to the row having nulls.
The output should look like
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [1000, 'x', 'A'], [1700,'z','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
I have tried groupby and followed Merge two duplicate rows with imputing values from each other
but it did not work. can someone help
CodePudding user response:
If there is multiple columns use DataFrame.fillna
with minimal values per groups to new columns by GroupBy.transform
:
cols = ['Price','Col1']
df[cols] = df[cols].fillna(df.groupby('Model')[cols].transform('min'))
print(df)
Price Attribute Model
0 1000.0 x A
1 2000.0 y A
2 1000.0 NaN A
3 1700.0 NaN B
4 1700.0 z B
EDIT: If need replace all missing values by rows with no NaN
s use:
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [np.nan,np.nan, 'A'],
[np.nan,np.nan,'B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df1 = df.loc[df.dropna().groupby('Model')['Price'].idxmin()]
print (df1)
Price Attribute Model
0 1000.0 x A
4 1700.0 z B
df = df.set_index('Model').fillna(df1.set_index('Model')).reset_index()
print (df)
Model Price Attribute
0 A 1000.0 x
1 A 2000.0 y
2 A 1000.0 x
3 B 1700.0 z
4 B 1700.0 z