I have a dataframe like as below
import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
'customer': rng.choice(list('ACD'),size=(5)),
'region': rng.choice(list('PQRS'),size=(5)),
'dumeel': rng.choice(list('QWER'),size=(5)),
'dumma': rng.choice((1234),size=(5)),
'target': rng.choice([0,1],size=(5))
})
I am trying to split the dataframe based on Customer
and store it in a folder. Not necessary to understand the full code. The issue is in the last line.
i = 0
for k, v in df.groupby(['Customer']):
print(k.split('@')[0])
LM = k.split('@')[0]
i = i 1
unique_cust_names = '_'.join(v['Customer'].astype(str).unique())
unique_ids = '_'.join(v['Id'].astype(str).unique())
unique_location = '_'.join(v['dumeel'].astype(str).unique())
filename = '_'.join([unique_ids, unique_cust_names, unique_location, LM])
print(filename)
with pd.ExcelWriter(f"{filename}.xlsx", engine='xlsxwriter') as writer:
v.to_excel(writer,columns=col_list,index=False)
wb = load_workbook(filename = 'format_sheet.xlsx')
sheet_from =wb.worksheets[0]
wb1 = load_workbook(filename = f"{filename}.xlsx")
sheet_to = wb1.worksheets[0]
copy_styles(sheet_from, sheet_to)
#wb.close()
tab = Table(displayName = "Table1", ref = "A1:" get_column_letter(wb1.worksheets[0].max_column) str(wb1.worksheets[0].max_row) )
style = TableStyleInfo(name="TableStyleMedium2", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=False)
tab.tableStyleInfo = style
wb1.worksheets[0].add_table(tab)
#wb1.worksheets[0].parent.save(f"{filename}.xlsx")
wb1.save("test_files/" f"{filename}.xlsx") # issue is here
wb1.close()
print("Total number of customers to be emailed is ", i)
Though the code works fine, the issue is in the below line I guess
wb1.save("test_files/" f"{filename}.xlsx") # issue is here
This creates two copies of files.. One in the current folder as jupyter notebook file and other one inside the test_files
folder.
For ex: I see two files called test1.xlsx
one in the current folder and one inside the test_files
folder (path is test_files/test1.xlsx)
How can I avoid this?
I expect my output to generate/save only 1 file for each customer inside the test_files
folder?
CodePudding user response:
The issue is happening because you are referencing 2 different file names one with the prefix "test_files/"
and once without it. Best way to handle it will be to define file name as follows
dir_filename = "test_files/" f"{filename}.xlsx"
and then reference it in the following places
with pd.ExcelWriter(f"{filename}.xlsx", engine='xlsxwriter') as writer:
v.to_excel(writer,columns=col_list,index=False)
##
wb1 = load_workbook(filename = f"{filename}.xlsx")
##
wb1.save("test_files/" f"{filename}.xlsx")
Hope it helps