How do I send my output xls files to a specific path in python?-CodePudding

How do i make my df.to_excel function write to an output path? After my script runs, I do not see the files in the output_path directory i have defined.

import pandas as pd
from openpyxl import load_workbook
import os
import datetime

output_path = 'C:/Users/g/Desktop/autotranscribe/python/Processed'

path = 'C:/Users/g/Desktop/autotranscribe/python/Matching'

cols_to_drop =  ['PSI ID','PSIvet Region','PSIvet region num','Fax','County']               
 
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State'} 

for file in os.listdir("C:/Users/g/Desktop/autotranscribe/python/Matching"):
    if file.startswith("PSI") and "(updated headers)" not in file:
        dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
        output = dict()
        for ws, df in dfs.items():
            if ws.startswith("Cancelled Members"): df = df.drop('Active date', axis=1)
            if any(ws.startswith(x) for x in ["New Members","PVCC"]):
                continue
            #if ws in ["New Members 03.22","PVCC"]:   #sheetstoavoid
            temp = df
            dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
            output[ws] = temp
        writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()

I tried df.to_excel(writer,output_path, index=None, sheet_name=ws)

But i get an error File "", line 36, in df.to_excel(writer,output_path, index=None, sheet_name=ws)

TypeError: to_excel() got multiple values for argument 'sheet_name'.

CodePudding user response：

A few comments:

The function os.listdir() only returns "unqualified" file names, so before using file, we need to prepend path using something like input_file_name = f'{path}/{file}'.
Similarly, pd.ExcelWriter() will need a qualified file name (that is, including the path as well as the "unqualified" file name), which we can get by doing this: output_file_name = f'{output_path}/{file.replace(".xlsx","")} (updated headers).xlsx'.
There are some elements of the code in your question that may not be getting used, but rather than comment on or change those, I provide a working version with minimal changes below.

I created directories named Matching and Processed. I placed a file named PSI 123.xlsx in Matching with a tab named Cancelled Members containing the following:

will skip           
will skip           
will skip           
will skip           
will skip           
Col1        Col2    Col3    Active date
xx          NY      110     789

I then ran the following modification to your code (note the changes to output_path and path for testing purposes in my environment):

import pandas as pd
from openpyxl import load_workbook
import os
import datetime

#output_path = 'C:/Users/g/Desktop/autotranscribe/python/Processed'

#path = 'C:/Users/g/Desktop/autotranscribe/python/Matching'

output_path = './Processed'
path = './Matching'

cols_to_drop =  ['PSI ID','PSIvet Region','PSIvet region num','Fax','County']               
 
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State'} 

for file in os.listdir(path):
    if file.startswith("PSI") and "(updated headers)" not in file:
        input_file_name = f'{path}/{file}'
        dfs = pd.read_excel(input_file_name, sheet_name=None,skiprows=5)
        output = dict()
        for ws, df in dfs.items():
            if ws.startswith("Cancelled Members") and 'Active date' in df.columns: df = df.drop('Active date', axis=1)
            if any(ws.startswith(x) for x in ["New Members","PVCC"]):
                continue
            #if ws in ["New Members 03.22","PVCC"]:   #sheetstoavoid
            temp = df
            dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
            output[ws] = temp
        output_file_name = f'{output_path}/{file.replace(".xlsx","")} (updated headers).xlsx'
        writer = pd.ExcelWriter(output_file_name)
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()

After running, the code had created a new file in Processed named PSI 123 (updated headers).xlsx with sheets named as in the input. The sheet Cancelled Members contained the following:

Address State   Zip Status  Status.1    Date    Partner
Col1    Col2    Col3
xx      NY      110