I've been going crazy looking for this all over the place. Most splits that I've found is ''if you want to split the X variable in the dataset''. No, I need to split the entire dataset, the full totality of it, to training and test by 50-50. Please help, I'm new to this and somehow stumbling my way through it.
Let's say that the dataset is named DATASET. What do I do?
CodePudding user response:
here is another way:
DATASET[, train:=sample(1:.N)<nrow(DATASET)/2]
CodePudding user response:
data.table method
library(data.table)
setDT(DATASET)
DATASET[, test := sample(0:1, nrow(DATASET), replace = T, prob = c(0.5,0.5))]
DATASET_1 <- split(DATASET, 'test')
Base R method
DATASET$test <- sample(0:1, nrow(DATASET), replace = T, prob = c(0.5,0.5))
DATASET_1 <- split(DATASET, DATASET$test)