Good afternoon,
I'm working on a python program that will take 3 separate dataframes and and them into an existing excel file; overwriting the cell ranges in question but leaving the rest of the rows and columns unaltered.
Below is an example of the Excel file structure
Keywords | Match type | col1a | col1b | col1c | col2a | col2b | col2c | col3a | col3b | col3c | counter |
---|---|---|---|---|---|---|---|---|---|---|---|
not to be removed | not to be removed | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | not to be removed |
not to be removed | not to be removed | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | replaced data | not to be removed |
In this I need the first df starting in row 2 column 3, the second df in col 6 and the third df in column 9.
Currently with the code below I can get the data into the correct position but all the other data gets lost in the process. I think it may be possible to merge the Excel if opened as a dataframe and the newer data frames but no such luck so far.
My code is below, I am still fiddling with this and at the time of writing the old data has been opened but no action with it has been taken.
DF_LastMonthDL = pd.read_csv (LastMonthDL)
DF_Last3MonthsDL = pd.read_csv (Last3MonthsDL)
DF_LifeTimeDL = pd.read_csv (LifeTimeDL)
########################################################## Manipulating the dataframes
#Sorting the arrays to keep ordering consistent
DF_LifeTimeDL.sort_index(0)
DF_LastMonthDL.sort_index(0)
DF_Last3MonthsDL.sort_index(0)
#Removing first cols as uneeded ¦ Keywords, Matchtype
DF_LifeTimeShrt = DF_LifeTimeDL[["Impressions", "Clicks", "CTR", "Spend(GBP)", "CPC(GBP)", "Orders", "Sales(GBP)","ACOS","ROAS"]]
DF_Last3MonthsShrt = DF_Last3MonthsDL[["Impressions", "Clicks", "CTR", "Spend(GBP)", "CPC(GBP)", "Orders", "Sales(GBP)","ACOS","ROAS"]]
DF_LastMonthShrt = DF_LastMonthDL[["Impressions", "Clicks", "CTR", "Spend(GBP)", "CPC(GBP)", "Orders", "Sales(GBP)","ACOS","ROAS"]]
oldData = pd.read_excel(r"oldData.xlsx")
########################################################## Exporting into excel in set positions
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('Temp.xlsx', engine='openpyxl')
# Position the dataframes in the worksheet
DF_LifeTimeShrt.to_excel(writer, sheet_name='LifeTime', startrow=2, startcol=2, header=True, index=False)
DF_Last3MonthsShrt.to_excel(writer, sheet_name='Sheet1', startrow=2, startcol=11, header=False, index=False)
DF_LastMonthShrt.to_excel(writer, sheet_name='Sheet1', startrow=2, startcol=20, header=False, index=False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Any guidance on this would be greatly appreciated.
CodePudding user response:
you can do this using openpyxl.load_workbook() and updating the cells, similar to what you are doing above. Assuming you have the initial part all working correctly, just need to change the last part as below...
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
writer = openpyxl.load_workbook('Temp.xlsx')
ws=writer['LifeTime']
rows = dataframe_to_rows(DF_LifeTimeShrt, index=False, header=True)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx 2, column=c_idx 2, value=value)
ws=writer['Sheet1']
rows = dataframe_to_rows(DF_Last3MonthsShrt, index=False, header=True)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx 2, column=c_idx 2, value=value)
ws=writer['Sheet2']
rows = dataframe_to_rows(DF_LastMonthShrt, index=False, header=True)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx 2, column=c_idx 2, value=value)
# Close the Excel file... need to provide name the file it needs to be written to.
writer.save('Temp.xlsx')
EDIT - The advantage with load_workbook is that it updates the cell and only overwrites a particular cell without any changes to other cells or even overwriting the color, etc. that may be present. The dataframe_to_rows gives you a way to get a whole DF row into a openpyxl readable from. From there, I am basically reading each row and column (a cell) and updating the value (ws.cell(row,col).value
) with the value from the df. The disadvantage of this is that you need to go through the for loops (unlike say df.to_excel), but advantage is that you can update a single cell value without disturbing anything else.... Hope this explanation helps.