Accessing a function variable without calling the function-CodePudding

Background -
I am a beginner, who is writing their first serious program; an API call is made, which returns a response that I save as a .json file, and dataframe (df) before doing some math and transformation and saving as a new dataframe called exceptions_df. Finally, I am looking to write both df and exceptions_df to a .xlsx file (def xlsx_writer()).

Problem -
def xlsx_writer(): writes the exception_df to a .xlsx file no problem. However, I would also like it to write df to the same file.

Whilst I can access the df variable by using df = def ownership_qc(), that function also calls other functions (df = unpack_response()), which itself, calls another function named api_response = response_writer()). Can you see my problem? My code would unnecessarily be repeating itself (i.e., calling related functions again) by using this method. My current code does this and 'works', however it seems very silly to be repeating code; and it's ultra slow.

Relevant functions -
def xlsx_writer - this is the function I would like to utilize df in. It currently does that through df = ownership_qc(), however it's slow, based on the above description:

def xlsx_writer():
    exceptions_df = ownership_exceptions()
#   This is where I call df and cause the a chain reaction of ownership_qc() calling function it requires and so forth.
    df = ownership_qc()
    
    timestr = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")
    filename = 'ownership_exceptions_' timestr
    
    writer = pd.ExcelWriter(filename '.xlsx', engine='xlsxwriter')
    exceptions_df.to_excel(writer, sheet_name='Ownership Exceptions Report', startrow=1, header=False, index=False)
    df.to_excel(writer, sheet_name='Raw Extract', startrow=1, header=False, index=False)
    workbook = writer.book
    worksheet = writer.sheets['Ownership Exceptions Report']
    (max_row, max_col) = exceptions_df.shape
    column_settings = [{'header': column} for column in exceptions_df.columns]
    worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
    worksheet.set_column(0, max_col - 1, 12)
xlsx_writer()

def ownership_exceptions(): -
This is the function I call in def xlsx_writer() to access df. The problem is that this function calls another function, and starts the chain reaction:

# This is the function I am calling to acces `df`, however this function calls another function.
def ownership_exceptions():
    df = ownership_qc()
    df = df[(df['Entity ID %'] != 1.000000) & (df['Account # %'] != 1.000000)]
    df.drop(df.index[df['Entity ID %'] == '1'], inplace=True)
    exceptions_df = df.drop(['Model Type', 'Valuation (USD)', 'Top Level Legal Entity', 'Financial Service', 'Account Close Date'], axis=1)
    return df

My thoughts

Is there a way to access df without having it's function (def ownership_exceptions) called and thus all other related functions called?
Should I simply create a new function that loads the API response (.json) back into a dataframe and access that way?

CodePudding user response：

I would probably make ownership_exceptions return (a tuple of) two values: exception_df and df. That way, you don't need to call ownership_qc twice.

Update your ownership_exceptions a bit, to process exceptions_df instead of df:

def ownership_exceptions():
    df = ownership_qc()
    exceptions_df = df[(df['Entity ID %'] != 1.000000) & (df['Account # %'] != 1.000000)]
    exceptions_df = df[df['Entity ID %'] != '1']
    exceptions_df = df.drop(['Model Type', 'Valuation (USD)', 'Top Level Legal Entity', 'Financial Service', 'Account Close Date'], axis=1)
    return exceptions_df, df  # <---- Notice two values are being returned here

And change your xlsx_writer function to only call ownership_exceptions:

def xlsx_writer():
    exceptions_df, df = ownership_exceptions()
    
    # now use exceptions_df and df...