How do you divide your test data from train data-CodePudding

hi guyss currently learning deep learning and machine learning

i read some of github explanation while studying the code too

but there is no explanation of how they seperate test data from this code(the bottom part where there is a comment # declare data for training and validation, if you want, you can seperate testset from this

# 1. Creating Datasets
# define temporary empty list for load
data = []
label = []
Totalnb = 0

# Load Dataset
for i in range(n_labels):
    nb = 0
    # Counting datasets in each labels
    for root, dirs, files in os.walk('Progress/DataLatihBaru/'   str(i 1)): # set directory
        for name in dirs:
            nb = nb   1
    print(i,"Label number of Dataset is:",nb)
    Totalnb = Totalnb   nb
    # by Counting size, cross subfolder and read image data, resize image, and append list 
    for j in range(nb):
        temp = []
        for k in range(timesteps):
            # name = 'NormalizedCascaded/'   str(i 1)   '/'   str(j 1)   '/'   str(k 1)   '.jpg'
            name = 'Progress/DataLatihBaru/'   str(i 1)   '/'   str(j 1)   '/'   'a ('   str(k 1)   ')'   '.jpg'
            img = cv2.imread(name)
            res = cv2.resize(img, dsize=(img_col, img_row), interpolation=cv2.INTER_CUBIC)
            temp.append(res)
        label.append(i)        
        data.append(temp)
print("Total Number of Data is",Totalnb)

# Convert List to numpy array, for Keras use
Train_label = np.eye(n_labels)[label] # One-hot encoding by np array function
Train_data = np.array(data)
print("Dataset shape is",Train_data.shape, "(size, timestep, column, row, channel)")
print("Label shape is",Train_label.shape,"(size, label onehot vector)")
# shuffling dataset for input fit function
# if don`t, can`t train model entirely
x = np.arange(Train_label.shape[0])
np.random.shuffle(x)
# same order shuffle is needed
Train_label = Train_label[x]
Train_data = Train_data[x]

# declare data for training and validation, if you want, you can seperate testset from this
X_train=Train_data[0:Totalnb,:]
Y_train=Train_label[0:Totalnb]

can anyone help me so that i can understand that part of how do i should seperate the data to the test with a little bit of explanation?

thank you so much !

CodePudding user response：

Since Train_label and Train_data are already shuffled, you can simply change the range for a train-test-split.

train_size = 0.8
X_train=Train_data[:int(Totalnb * TRAIN_RATIO),:]
Y_train=Train_label[:int(Totalnb * TRAIN_RATIO)]
X_test=Train_data[int(Totalnb * TRAIN_RATIO):,:]
Y_test=Train_label[int(Totalnb * TRAIN_RATIO):]

Or, use train_test_split from sklearn:

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(Train_data, Train_label, test_size=0.2)