I'm using a grid search to tune the hyperparameters of my DNN, which has 2 depth layers. I'm currently scoring each model based on the average loss in the test set, but I'm not sure if this is the best approach. Would it be better to use the accuracy, or both the loss and accuracy, as a scoring metric? How do other people typically score their models during hyperparameter tuning? Any advice or insights would be greatly appreciated.
CodePudding user response:
The first thing in your experimental setup is using the test set while making hyperparameter tunning. You should train your model with your train
set and make your hyperparameter tunning with your validation
set. After finishing this process, you need to use test
set to get the model score is the best option to the way of using/splitting the dataset correctly.
The second part of your question is very open-ended, but you may benefit from the following tips:
- Difference metrics may be suitable for different tasks, so it is important to choose the right metric. For instance, in some classification tasks you would like to track
accuracy
, and some of themrecall
orprecision
etc. (or you can use and track multiple metrics to understand your model behavior more deeper) - The recent advancement on this topic is generally referred to as
AutoML
and there are many different applications/libraries/methodologies that are used for hyperparameter tuning. So you may also want to search other methods rather than just using GridSeach. If you want to continue with GridSearch, to find the optimal parameters for your problem, you can switch theGridSearchCV
so you can test your model more than once with a different part of the dataset which makes your hyperparameter tunning operation more robust.