I have three variables named “age” “lr_scale” and “euRefVoteAfter”,the first two are numerical variables, the third one is a binary variable. And i want to use first two variables as two features to classify the third one. but i met some problems when i'm trying to scale thses two varibales. The error message says, Error in seq.default(min(training[, 2]) - 1, max(training[, 2]) 1, by = 0.01) : 'from' must be a finite number
Maybe i should add some arguement when i'm doing the scaling to make the variables keep finite?
if someone can help me figure out, i'd be really appreciated!
library(caTools)
dplyr::select(bes,"age","lr_scale","euRefVoteAfter")
split <- sample.split(bes$euRefVoteAfter,SplitRatio = 0.75)
training <- subset(bes, split=T)
testing <- subset(bes,split=F)
training[-3] <- scale(training[-3])
testing[-3] <- scale(testing[-3])
library(class)
X1 <- seq(min(training[,1])-1,max(training[,1]) 1,by=0.01)
X2 <- seq(min(training[,2])-1,max(training[,2]) 1,by=0.01)
CodePudding user response:
Some elements in the 'training' columns 1 or 2 may have missing values (NA
) and if we don't remove them, the min
or max
returns NA
. We may need to use the na.rm = TRUE
in min
or max
to remove those NA
elements while calculating the min
or max
X1 <- seq(min(training[,1], na.rm = TRUE)-1,max(training[,1], na.rm = TRUE) 1,by=0.01)
X2 <- seq(min(training[,2], na.rm = TRUE)-1,max(training[,2], na.rm = TRUE) 1,by=0.01)
As a reproducible example
> start <- c(1, 3, NA)
> end <- c(5, NA, 7)
> seq(min(start), max(end))
Error in seq.default(min(start), max(end)) :
'from' must be a finite number
> min(start)
[1] NA
> max(start)
[1] NA> min(start, na.rm = TRUE)
[1] 1
> max(end, na.rm = TRUE)
[1] 7
If we add the na.rm = TRUE
(by default it is FALSE
)
> seq(min(start, na.rm = TRUE), max(end, na.rm = TRUE))
[1] 1 2 3 4 5 6 7