I have a data frame df
that I would like to separate into a training set and a test set. Instead of getting only a single training and test set, I would like to get a distribution of them (n = 100).
I try and do this with lapply
, but the values for each element in the list end up being exactly the same. How do I randomize the values in the two list (i.e., train.data
and test.data
)?
The expected output would be a list for both train.data
and test.data
, each containing 100 elements with different subsets of df
in both of them.
library(lubridate)
library(tidyverse)
library(caret)
date <- rep_len(seq(dmy("01-01-2013"), dmy("31-12-2013"), by = "days"), 300)
ID <- rep(c("A","B","C"), 50)
class <- rep(c("N","M"), 50)
df <- data.frame(value = runif(length(date), min = 0.5, max = 25),
ID,
class)
training.samples <- df$class %>%
createDataPartition(p = 0.6, list = FALSE)
n <- 100
train.data <- lapply(1:n, function(x){
df[training.samples, ]
})
test.data <- lapply(1:n, function(x){
df[-training.samples, ]
})
CodePudding user response:
Try using replicate
f1 <- function(dat, colnm) {
s1 <- createDataPartition(dat[[colnm]], p = 0.6,
list = FALSE)
return(list(train.data = dat[s1,], test.data = dat[-s1,]))
}
n <- 100
out <- replicate(n, f1(df, "class"), simplify = FALSE)