Home > Back-end >  'train' and 'class' have different lengths error in R
'train' and 'class' have different lengths error in R

Time:02-11

I just wanted to conduct a kNN classification with the situation when k is 3. I would like to predict the dependent variable “diabetes” in valid set using train set and calculate the accuracy.

But I faced to the error message with

Error in knn(train = TrainXNormDF, test = ValidXNormDF, cl = MLdata2[, : 'train' and 'class' have different lengths

I can't solve this problem with get approach with

for(i in ((length(MLValidY)   1):length(TrainXNormDF))) (MLValidY = c(MLValidY, 0))

What can I do for it? Please help.

My code is as like below

install.packages("mlbench")
install.packages("gbm")

library(mlbench)
library(gbm)

data("PimaIndiansDiabetes2")
head(PimaIndiansDiabetes2)

MLdata <- as.data.frame(PimaIndiansDiabetes2)
head(MLdata)
str(MLdata)
View(MLdata)

any(is.na(MLdata))
sum(is.na(MLdata))

MLdata2 <- na.omit(MLdata)
any(is.na(MLdata2))
sum(is.na(MLdata2))
View(MLdata2)

MLIdx <- sample(1:3, size = nrow(MLdata2), prob = c(0.6, 0.2, 0.2), replace = TRUE)

MLTrain <- MLdata2[MLIdx == 1,]
MLValid <- MLdata2[MLIdx == 2,]
MLTest <- MLdata2[MLIdx == 3,]

head(MLTrain)
head(MLValid)
head(MLTest)

str(MLTrain)
str(MLValid)
str(MLTest)

View(MLTestY)


MLTrainX <- MLTrain[ , -9]
MLValidX <- MLValid[ , -9]
MLTestX <- MLTest[ , -9]

MLTrainY <- as.data.frame(MLTrain[ , 9])
MLValidY <- as.data.frame(MLValid[ , 9])
MLTestY <- as.data.frame(MLTest[ , 9])

View(MLTrainX)
View(MLTrainY)

library(caret)

NormValues <- preProcess(MLTrainX, method = c("center", "scale"))

TrainXNormDF <- predict(NormValues, MLTrainX)
ValidXNormDF <- predict(NormValues, MLValidX)
TestXNormDF <- predict(NormValues, MLTestX)

head(TrainXNormDF)
head(ValidXNormDF)
head(TestXNormDF)


install.packages('FNN')
library(FNN)
library(class)

NN <- knn(train = TrainXNormDF, 
      test = ValidXNormDF,
      cl = MLValidY,
      k = 3)

Thank you

CodePudding user response:

Your cl variable is not the same length as your train variable. MLValidY only has 74 observations, while TrainXNormDF has 224.

cl should provide the true classification for every row in your training set.

Furthermore, cl is a data.frame instead of a vector.

Try the following:

NN <- knn(train = TrainXNormDF, 
      test = ValidXNormDF,
      cl = MLTrainY$`MLTrain[, 9]`,
      k = 3)

CodePudding user response:

As @rw2 stated, it's the length of cl. I think you meant to use MLtrainY, not MLvalidY. When you have a single column data frame, you can still run into shape problems (converts it to a vector). You could walk back to make sure that you use the right content here, like so:

NN <- knn(train = TrainXNormDF, 
          test = ValidXNormDF,
          cl = MLdata2[MLIdx == 1,]$diabetes, # shape no longer an issue
          k = 3)
  •  Tags:  
  • r knn
  • Related