Where to use validation set in model training-CodePudding

I have split my data into 3 sets train, test and validation as shown below:


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=1)

I wanted to ask where do I put the validation set in this code:


#Defining Model
model = LogisticRegression()
model.fit(X_train, y_train)


y_pred = model.predict(X_test)

print("Accuracy on test  is:",accuracy_score(y_test,y_pred))
#Measure Precision,Recall

print("Precision Score: ",precision_score(y_test, y_pred,average='macro'))
print("Recall Score:    ",recall_score(y_test, y_pred,average='macro'))
print("F1-Score :",f1_score(y_test, y_pred,average='macro'))

CodePudding user response：

You can show validation data score and accuracy like this way:

#Defining Model
model = LogisticRegression()
model.fit(X_train, y_train)

y_pred_val = model.predict(X_val)
print("Accuracy on val is:",accuracy_score(y_val, y_pred_val))

y_pred = model.predict(X_test)
print("Accuracy on test is:",accuracy_score(y_test,y_pred))
#Measure Precision,Recall

print("Precision Score for Val: ",precision_score(y_val, y_pred_val, average='macro'))
print("Recall Score for Val:    ",recall_score(y_val, y_pred_val, average='macro'))
print("F1-Score for Val :",f1_score(y_val, y_pred_val, average='macro'))

print("Precision Score: ",precision_score(y_test, y_pred,average='macro'))
print("Recall Score:    ",recall_score(y_test, y_pred,average='macro'))
print("F1-Score :",f1_score(y_test, y_pred,average='macro'))

CodePudding user response：

I suggest you to read up some more on why you would split your date into a train, test and validation set. In the code you show you can use validation data in the same way you use your test data but that doesn't really make fully sense. There is a lot to it, I think this can get you started. Link

In short and very simplified, the general idea is that you use results from test data to make adjustment to your model, to improve its performance. Your validation data you use only at the very end for your final model evaluation, to make sure it actually does perform well on unseen data. (Worst case if you only use two sets, then you might adjust parameters until works well for these two datasets but still not any other.)