can someone explain to me why the value of split is false in the test set?
split = sample.split(dataset$Salary, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
CodePudding user response:
I assume you got this code from some kind of caTools
documentation? I recommend trying to run the first line of code and it should start to make sense.
Basically what caTools::sample.split
does is create a random vector of length nrow(x)
with TRUEs and FALSEs, in the given ratio. Let's take the iris
dataset for example (which has 150 rows):
split = sample.split(iris$Sepal.Length, SplitRatio = 2/3)
The result will be a 150 item vector with 2/3 TRUE and 1/3 FALSE.
Next you use the subset
function to extract all the rows i
from iris
where split[i] == TRUE
to create the training set and use all the rows i
from iris
where split[i] == FALSE
to create the test set.
That is why you use split == TRUE
in the training set and split == FALSE
in the test set