Home > other >  How to split training dataset and testing data set
How to split training dataset and testing data set

Time:11-30

I need to split dataset into training a testing without using sklearn.model_selection.train_test_split.

I want the approach to be as follows:

  1. Read dataset from excel with 100 rows (DONE):

    data = pd.read_excel('file.xlsx')
    
  2. From the 100 rows, select 75% random rows as testing data (DONE):

    random_training = dataset.sample(75)
    
  3. Use a for loop to check which indexes exist in data list, but do not exist in random_training list. If not existing in random_training list, then put in list random_testing list. This is where I am finding it hard to execute. Any ideas?

CodePudding user response:

You can use DataLoader and SubsetRandomSampler and random.sample:

from torch.utils.data import DataLoader,SubsetRandomSampler
import random

indices = random.sample(range(1, len(dataset)), (int)(len(dataset)*0.75))
missing_indices = [index 
                    for index in range(0, len(dataset))
                    if index not in indices]
dl_valid = DataLoader(dataset,batch_size,sampler=SubsetRandomSampler(indices.astype("int")),num_workers = num_workers)
dl_train = DataLoader(dataset,batch_size,sampler=SubsetRandomSampler(missing_indices.astype("int")),num_workers = num_workers)

CodePudding user response:

tr=list(random_training.index)
testing=data.loc[data.index.drop(tr)]
  • Related