Home > OS >  Train Split data
Train Split data

Time:04-09

I have this data frame, and I am interested in dividing the data into a ratio. So, 2013 to 2018 will be in the training set, and 2019 to 2022 in the testing set. I have tried it but it keeps randomly selecting the dates from the data. Anyone, please help.

Here is my code.

split<- sample.split(cement_index$CCYY, SplitRatio = 0.7)
train = subset(cement_index, split == TRUE)
test = subset(cement_index, split == FALSE

CodePudding user response:

You need to create the two groups you mention:

groups <- ifelse(cement_index$CCYY < 2019, "2013-2018", "2019-2022")
split <- sample.split(groups, SplitRatio = 0.7)
train = subset(cement_index, split == TRUE)
test = subset(cement_index, split == FALSE)

CodePudding user response:

A different approach would be splitting the data into two subsets. You can use the following code (I created random numbers for your index):

df <- data.frame(CCYY = c(2013:2022),
                 index = sample(1:10, 10))

split <- split(df, cut(df$CCYY, c(2012, 2018, 2022), include.lowest=F))

train = split$`(2012,2018]`
test = split$`(2018,2022]`

Output train:

  CCYY index
1 2013     8
2 2014     1
3 2015     3
4 2016     7
5 2017     5
6 2018     9

Output test:

   CCYY index
7  2019     6
8  2020    10
9  2021     2
10 2022     4
  • Related