why get index error after put the function in the script?-CodePudding

I successfully run these two functions and get the desired graphs in JupyterLab:

### function of getting name of dataset
def get_df_name(df):
    name =[x for x in globals() if globals()[x] is df][0]
    return name

#### function of all histogram plots
def hisplots_traintest(train, test, figsize):
    fig, axs = plt.subplots(1, 2, figsize = figsize)
    axs[0].hist(train['log(price($/lb))'], bins = 30)
    axs[0].title.set_text('log(price) histogram for '   get_df_name(train))
    axs[1].hist(test['log(price($/lb))'], bins = 30)
    axs[1].title.set_text('log(price) histogram for '   get_df_name(test))
    fig.show()

However, I'm trying to put these two functions into a script. After I loaded the script, these functions do not work as well and get list index out of range error. Could someone tell me why and how to fix it? Thank you so much!!

CodePudding user response：

Seems like you're searching for your dataset in the global scope of the file in which these functions are defined. The problem might be that the dataset is defined in a different file.

You can add an argument to your function in which you pass the name of the dataset to it, for it to be displayed.

CodePudding user response：

Querying globals() has a bit of a code smell to it.. and I suspect Jupyter notebooks are doing some massaging of variables to track state.

In any case, once your code is moderately complex, the "name" of a variable depends on its scope, whether you're querying it from inside a stand-alone script, in an interactive shell, inside a test function, or as part of a larger subroutine.

You're probably better off passing a dictionary into hisplots_trainset() like this:

display_params = {
    'test_df_name': 'predict_test1',
    'test_df': predict_test1,
    'train_df_name': 'predict_train1',
    'train_df': predict_train1
}

That way, you have nice human-readable names for your plots.

You can also generalize this by passing around {'name': 'name of df', 'df': df} dicts, if you really need something close to the generality of your original get_df_name()