how to do hyperparameter optimization in large data?-CodePudding

after spending lots of days iam almost finished my timeseries model, collected enough data and now iam stuck at hyperparameter optimization. i currently have AMD vega 11 mobile gpu and i can't offer to buy new gpu right now and training in cpu take probably more than one month. And after lots of googling i found new & good library called ultraopt but problem is that how much amount of fragment of data should i use from my totaldata(~150GB) for hyerparameter tuning.and i want to try lots of algoritham and combinations deos there any faster and easy way?

deos there any math involved something like, mydata =100%size,

hyperparameter optimization with 5% of mydatasize,

optimized hyperparameter *or or something with left 95% of datasize #something like this

to get a similar result as full data used for optimization at a time.deos there any shortcut for these? and thanks in advance(i am gratefull for every suggestions).

iam using pyhton 3.7, cpu: AMD ryzen5 3400g, gpu:amd vega 11, ram:16gb

CodePudding user response：

Hyperparameter tuning is typically done on the validation set of a train-val-test split, where each split will have something along the lines of 70%, 10%, and 20% of the entire dataset respectively. As a baseline, random search can be used while Bayesian optimization with Gaussian processes has been shown to be more compute efficient. scikit-optimize is a good package for this.

CodePudding user response：

A good python library for hyper-parameter tuning is keras tuner. You can utilize different tuners in this library, but for the large data, as you've mentioned, Hyperband Optimization can be state-of-the-art and appropriate one.