Background -
I am a beginner, who is writing their first serious program; an API call is made, which returns a response that I save as a .json
file, and dataframe (df
) before doing some math and transformation and saving as a new dataframe called exceptions_df
. Finally, I am looking to write both df
and exceptions_df
to a .xlsx
file (def xlsx_writer()
).
Problem -def xlsx_writer():
writes the exception_df
to a .xlsx
file no problem. However, I would also like it to write df
to the same file.
Whilst I can access the df
variable by using df = def ownership_qc()
, that function also calls other functions (df = unpack_response()
), which itself, calls another function named api_response = response_writer())
. Can you see my problem? My code would unnecessarily be repeating itself (i.e., calling related functions again) by using this method. My current code does this and 'works', however it seems very silly to be repeating code; and it's ultra slow.
Relevant functions -
def xlsx_writer
- this is the function I would like to utilize df
in. It currently does that through df = ownership_qc()
, however it's slow, based on the above description:
def xlsx_writer():
exceptions_df = ownership_exceptions()
# This is where I call df and cause the a chain reaction of ownership_qc() calling function it requires and so forth.
df = ownership_qc()
timestr = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")
filename = 'ownership_exceptions_' timestr
writer = pd.ExcelWriter(filename '.xlsx', engine='xlsxwriter')
exceptions_df.to_excel(writer, sheet_name='Ownership Exceptions Report', startrow=1, header=False, index=False)
df.to_excel(writer, sheet_name='Raw Extract', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Ownership Exceptions Report']
(max_row, max_col) = exceptions_df.shape
column_settings = [{'header': column} for column in exceptions_df.columns]
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
worksheet.set_column(0, max_col - 1, 12)
xlsx_writer()
def ownership_exceptions():
-
This is the function I call in def xlsx_writer()
to access df
. The problem is that this function calls another function, and starts the chain reaction:
# This is the function I am calling to acces `df`, however this function calls another function.
def ownership_exceptions():
df = ownership_qc()
df = df[(df['Entity ID %'] != 1.000000) & (df['Account # %'] != 1.000000)]
df.drop(df.index[df['Entity ID %'] == '1'], inplace=True)
exceptions_df = df.drop(['Model Type', 'Valuation (USD)', 'Top Level Legal Entity', 'Financial Service', 'Account Close Date'], axis=1)
return df
My thoughts
- Is there a way to access
df
without having it's function (def ownership_exceptions
) called and thus all other related functions called? - Should I simply create a new function that loads the API response (.json) back into a dataframe and access that way?
CodePudding user response:
I would probably make ownership_exceptions
return (a tuple of) two values: exception_df
and df
. That way, you don't need to call ownership_qc
twice.
Update your ownership_exceptions
a bit, to process exceptions_df
instead of df
:
def ownership_exceptions():
df = ownership_qc()
exceptions_df = df[(df['Entity ID %'] != 1.000000) & (df['Account # %'] != 1.000000)]
exceptions_df = df[df['Entity ID %'] != '1']
exceptions_df = df.drop(['Model Type', 'Valuation (USD)', 'Top Level Legal Entity', 'Financial Service', 'Account Close Date'], axis=1)
return exceptions_df, df # <---- Notice two values are being returned here
And change your xlsx_writer
function to only call ownership_exceptions
:
def xlsx_writer():
exceptions_df, df = ownership_exceptions()
# now use exceptions_df and df...