Home > Blockchain >  How do I split the full dataset to training and test datasets by 50-50
How do I split the full dataset to training and test datasets by 50-50

Time:02-25

I've been going crazy looking for this all over the place. Most splits that I've found is ''if you want to split the X variable in the dataset''. No, I need to split the entire dataset, the full totality of it, to training and test by 50-50. Please help, I'm new to this and somehow stumbling my way through it.

Let's say that the dataset is named DATASET. What do I do?

CodePudding user response:

here is another way:

DATASET[, train:=sample(1:.N)<nrow(DATASET)/2]

CodePudding user response:

data.table method

library(data.table)

setDT(DATASET)

DATASET[, test := sample(0:1, nrow(DATASET), replace = T, prob = c(0.5,0.5))]

DATASET_1 <- split(DATASET, 'test')

Base R method

DATASET$test <- sample(0:1, nrow(DATASET), replace = T, prob = c(0.5,0.5))

DATASET_1 <- split(DATASET, DATASET$test)
  • Related