Home > Back-end >  Pandas .fillna() replacing every value with NaN instead of replacing only NaN values
Pandas .fillna() replacing every value with NaN instead of replacing only NaN values

Time:10-11

I'm currently attempting the Kaggle housing prices challenge seen in this link. https://www.kaggle.com/c/house-prices-advanced-regression-techniques.

I have a concatenated table which combines the training and testing tables into one in order to handle all missing values at once.

combine_df = pd.concat([train, test], axis=0, sort=False) 
combine_df.drop(['Id', 'SalePrice'], axis=1, inplace=True)

I then attempt to fill all NaN categorical values with the following lines below. Where null_columns is a list of columns that I want to replace NaN values.

combine_df[null_columns] = combine_df[null_columns].fillna('0', inplace=True)

However, this line changes every value in the columns into a NaN value instead of replacing NaN values with '0' as seen in the output below which shows the amount of NaN values for each column.

BsmtQual 2919 
BsmtCond 2919 
BsmtExposure 2919 
BsmtFinType1 2919
BsmtFinType2 2919 
GarageType 2919 
GarageFinish 2919 
GarageQual 2919 
GarageCond 2919 

I've tried using .replace, a lambda function, and also using .loc and all of them end up doing the same thing as the code above. What is going on with my code that causes this? I've also been unable to find anything regarding this on stack overflow. Any help would be greatly appreciated.

CodePudding user response:

Try without inplace=True in the following statement:

combine_df[null_columns] = combine_df[null_columns].fillna('0', inplace=True)

Replace it with:

combine_df[null_columns] = combine_df[null_columns].fillna('0')

.fillna() with inplace=True returns None rather than the resulting DataFrame after filling NaN. Hence, when you assign it back to combine_df[null_columns], you get all NaN.

From the official doc:

Returns

DataFrame or None Object with missing values filled or None if inplace=True.

CodePudding user response:

Or;

import numpy as np    
combine_df[null_columns] = np.where(combine_df[null_columns].isnull()==True,'0',combine_df[null_columns])

Or;

delete inplace=True argument.

  • Related