I'm currently attempting the Kaggle housing prices challenge seen in this link. https://www.kaggle.com/c/house-prices-advanced-regression-techniques.
I have a concatenated table which combines the training and testing tables into one in order to handle all missing values at once.
combine_df = pd.concat([train, test], axis=0, sort=False)
combine_df.drop(['Id', 'SalePrice'], axis=1, inplace=True)
I then attempt to fill all NaN categorical values with the following lines below. Where null_columns is a list of columns that I want to replace NaN values.
combine_df[null_columns] = combine_df[null_columns].fillna('0', inplace=True)
However, this line changes every value in the columns into a NaN value instead of replacing NaN values with '0' as seen in the output below which shows the amount of NaN values for each column.
BsmtQual 2919
BsmtCond 2919
BsmtExposure 2919
BsmtFinType1 2919
BsmtFinType2 2919
GarageType 2919
GarageFinish 2919
GarageQual 2919
GarageCond 2919
I've tried using .replace, a lambda function, and also using .loc and all of them end up doing the same thing as the code above. What is going on with my code that causes this? I've also been unable to find anything regarding this on stack overflow. Any help would be greatly appreciated.
CodePudding user response:
Try without inplace=True
in the following statement:
combine_df[null_columns] = combine_df[null_columns].fillna('0', inplace=True)
Replace it with:
combine_df[null_columns] = combine_df[null_columns].fillna('0')
.fillna()
with inplace=True
returns None
rather than the resulting DataFrame after filling NaN
. Hence, when you assign it back to combine_df[null_columns]
, you get all NaN
.
From the official doc:
Returns
DataFrame or None Object with missing values filled or None if inplace=True.
CodePudding user response:
Or;
import numpy as np
combine_df[null_columns] = np.where(combine_df[null_columns].isnull()==True,'0',combine_df[null_columns])
Or;
delete inplace=True
argument.