Home > front end >  some issues when i try to scale the variables
some issues when i try to scale the variables

Time:10-24

I have three variables named “age” “lr_scale” and “euRefVoteAfter”,the first two are numerical variables, the third one is a binary variable. And i want to use first two variables as two features to classify the third one. but i met some problems when i'm trying to scale thses two varibales. The error message says, Error in seq.default(min(training[, 2]) - 1, max(training[, 2]) 1, by = 0.01) : 'from' must be a finite number Maybe i should add some arguement when i'm doing the scaling to make the variables keep finite? if someone can help me figure out, i'd be really appreciated!

library(caTools)
dplyr::select(bes,"age","lr_scale","euRefVoteAfter")
split <- sample.split(bes$euRefVoteAfter,SplitRatio = 0.75)
training <- subset(bes, split=T)
testing <- subset(bes,split=F)
training[-3] <- scale(training[-3])
testing[-3] <- scale(testing[-3])
library(class)
X1 <- seq(min(training[,1])-1,max(training[,1]) 1,by=0.01)
X2 <- seq(min(training[,2])-1,max(training[,2]) 1,by=0.01)

CodePudding user response:

Some elements in the 'training' columns 1 or 2 may have missing values (NA) and if we don't remove them, the min or max returns NA. We may need to use the na.rm = TRUE in min or max to remove those NA elements while calculating the min or max

X1 <- seq(min(training[,1], na.rm = TRUE)-1,max(training[,1], na.rm = TRUE) 1,by=0.01)
X2 <- seq(min(training[,2], na.rm = TRUE)-1,max(training[,2], na.rm = TRUE) 1,by=0.01)

As a reproducible example

> start <- c(1, 3, NA)
> end <- c(5, NA, 7)
> seq(min(start), max(end))
Error in seq.default(min(start), max(end)) : 
  'from' must be a finite number
> min(start)
[1] NA
> max(start)
[1] NA> min(start, na.rm = TRUE)
[1] 1
> max(end, na.rm = TRUE)
[1] 7

If we add the na.rm = TRUE (by default it is FALSE)

> seq(min(start, na.rm = TRUE), max(end, na.rm = TRUE))
[1] 1 2 3 4 5 6 7
  •  Tags:  
  • r
  • Related