I just wanted to conduct a kNN classification with the situation when k is 3. I would like to predict the dependent variable “diabetes” in valid set using train set and calculate the accuracy.
But I faced to the error message with
Error in knn(train = TrainXNormDF, test = ValidXNormDF, cl = MLdata2[, : 'train' and 'class' have different lengths
I can't solve this problem with get approach with
for(i in ((length(MLValidY) 1):length(TrainXNormDF))) (MLValidY = c(MLValidY, 0))
What can I do for it? Please help.
My code is as like below
install.packages("mlbench")
install.packages("gbm")
library(mlbench)
library(gbm)
data("PimaIndiansDiabetes2")
head(PimaIndiansDiabetes2)
MLdata <- as.data.frame(PimaIndiansDiabetes2)
head(MLdata)
str(MLdata)
View(MLdata)
any(is.na(MLdata))
sum(is.na(MLdata))
MLdata2 <- na.omit(MLdata)
any(is.na(MLdata2))
sum(is.na(MLdata2))
View(MLdata2)
MLIdx <- sample(1:3, size = nrow(MLdata2), prob = c(0.6, 0.2, 0.2), replace = TRUE)
MLTrain <- MLdata2[MLIdx == 1,]
MLValid <- MLdata2[MLIdx == 2,]
MLTest <- MLdata2[MLIdx == 3,]
head(MLTrain)
head(MLValid)
head(MLTest)
str(MLTrain)
str(MLValid)
str(MLTest)
View(MLTestY)
MLTrainX <- MLTrain[ , -9]
MLValidX <- MLValid[ , -9]
MLTestX <- MLTest[ , -9]
MLTrainY <- as.data.frame(MLTrain[ , 9])
MLValidY <- as.data.frame(MLValid[ , 9])
MLTestY <- as.data.frame(MLTest[ , 9])
View(MLTrainX)
View(MLTrainY)
library(caret)
NormValues <- preProcess(MLTrainX, method = c("center", "scale"))
TrainXNormDF <- predict(NormValues, MLTrainX)
ValidXNormDF <- predict(NormValues, MLValidX)
TestXNormDF <- predict(NormValues, MLTestX)
head(TrainXNormDF)
head(ValidXNormDF)
head(TestXNormDF)
install.packages('FNN')
library(FNN)
library(class)
NN <- knn(train = TrainXNormDF,
test = ValidXNormDF,
cl = MLValidY,
k = 3)
Thank you
CodePudding user response:
Your cl
variable is not the same length as your train
variable. MLValidY
only has 74 observations, while TrainXNormDF
has 224.
cl
should provide the true classification for every row in your training set.
Furthermore, cl
is a data.frame instead of a vector.
Try the following:
NN <- knn(train = TrainXNormDF,
test = ValidXNormDF,
cl = MLTrainY$`MLTrain[, 9]`,
k = 3)
CodePudding user response:
As @rw2 stated, it's the length of cl
. I think you meant to use MLtrainY
, not MLvalidY
. When you have a single column data frame, you can still run into shape problems (converts it to a vector). You could walk back to make sure that you use the right content here, like so:
NN <- knn(train = TrainXNormDF,
test = ValidXNormDF,
cl = MLdata2[MLIdx == 1,]$diabetes, # shape no longer an issue
k = 3)