How can I get python to skip certain chunks of codes while running a .py file?-CodePudding

I am doing hyperparameter tuning, feature selection, training and fitting a few models to evaluate which is the best model to be used. So there are maybe a few hundred lines of code. However, as we may know, the hyperparameter tuning part usually takes hours to generate the output for the best parameters to be used for our model. And also there are other parts in my code which takes 1 or 2 minutes to verbose the output.

If I were to pass this .py file to someone, how can I ask python to skip certain chunks of code (which takes a long time to run) within my .py file when the person clicks "run"? So in short, the person does not need to run through the tuning part which takes hours and other parts which take more than 3 minutes to generate an output, but yet he will be able to predict my model and get the scores using the best hyperparameters which I have tuned earlier?

Appreciate if anyone can provide any examples as to how I can code it or suggest a better way, as I am still relatively new to coding.

Thank you in advance!

CodePudding user response：

Simple answer

Easy way, use a flag:

#!/usr/bin/python

RUN_TUNING = False

if RUN_TUNING:
    # slow tuning code here

# rest of the code here

Then, depending on how you run your script, you can hardcode the flag, or bind it to user input:

RUN_TUNING = input('Run tuning? (y/N): ').lower() in ['y', 'yes']

or maybe to a script parameter (e.g., using argparse or one of the many available alternatives)

Alternative for more complex code

If you have complex code, you can organize it in functions for better running the separate parts, here is a bit more elaborate option using a decorator to easily switch the functions on/off:

class SupervisedRun(object):
    run = False
    def __init__(self, func):
        self.func = func
        
    def __call__(self, *args, **kwargs):
        if self.run:
            return self.func(*args, **kwargs)
    
    
@SupervisedRun
def tuning_function1():
    print('I am only running when run=True')

def non_supervised_function1():
    print('I am always running')
    
@SupervisedRun
def tuning_function2():
    print('I am only running when run=True')

def non_supervised_function2():
    print('I am always running')

    
    
if __name__ == '__main__':
    SupervisedRun.run = input('Run tuning? (y/N): ').lower() in ['y', 'yes']

    tuning_function1()
    non_supervised_function1()
    tuning_function2()
    non_supervised_function2()

CodePudding user response：

Don't overthink, just use a simple if.

if not testing:
    # Run your code.

CodePudding user response：

If I understand correctly from your question you have trained your model on a dataset and now you want to avoid retraining it and load your trained model before using it on new data.You can do that using pickle.

For example if you have trained a Decision Tree Classifier likewise:

model = DecisionTreeClassifier()
X, y = load_iris(return_X_y=True)
clf = model.fit(X,y)

In this case, your trained model is stored in the Python object clf. To save the clf object to be reused later, you can use the built-in pickle library.

import pickle
with open('clf.pickle', 'wb') as f:
     pickle.dump(clf, f)

You can then pass this pickle file with your Python script to be re-used likewise:

import pickle
with open('clf.pickle', 'rb') as f:
    clf = pickle.load(f)

If you want to read more in this, you can refer to this article.