Home > Enterprise >  Confuse why my KNN code is throwing a ValueError
Confuse why my KNN code is throwing a ValueError

Time:06-03

I first want to say that I haven't done ML in a long time. I did take a few courses but I forgot most of it. This is also my first personal ML project without instructor so please treat me as a beginner. I am using sklearn for KNN regressor.

#importing libraries and data
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor as KNR
theta = pd.read_csv("train.csv")#pandas dataframe
#getting data wanted from theta and putting it in a new dataframe
a = theta.get("YearBuilt")
b = theta.get("YrSold")
A = a.to_frame()
B = b.to_frame()
glasses = [A,B]
x = pd.concat(glasses)
#getting target data
y = theta.get("SalePrice")
#using KNN
horses = KNR(n_neighbors = 3)
horses.fit(x,y)

I get this error message: ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could someone please explain this? My data is in the hundred thousands for target and the thousands for input. And there is no blanks in the data. Thanks.

CodePudding user response:

Before answering the question, Let me refactor the code. You are using a dataframe so you can index single or muliple fields of the dataframe without going through the extra steps you've used:

#importing libraries and data
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor as KNR

theta = pd.read_csv("train.csv") # pandas dataframe
#getting data wanted from theta and putting it in a new dataframe
x = theta[["YearBuilt", "YrSold"]] # index multiple fields
#getting target data
y = theta["SalePrice"] # index single field
#using KNN
horses = KNR(n_neighbors = 3)
horses.fit(x,y) # fit KNN

Regarding your error, it indicates that you have some NaN, Inf, large values in your data. You can ensure these doesnt occur by filtering out the NaN and inf values using this:

theta = theta.replace([np.inf, -np.inf], np.nan)

theta.dropna(inplace=True)
  • Related