Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting t-CodePudding

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.

I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data_sheets.xlsx')

for sheet in wb.worksheets:
    sheet.merged_cells
    for merge in list(sheet.merged_cells):
        sheet.unmerge_cells(range_string=str(merge))
    sheet.insert_cols(3, 1)
    print(sheet)

wb.save('workbook_test.xlsx')

#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)

Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:

Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5

The end result would make each sheet within the workbook look like this:

Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!

CodePudding user response：

Here is a 100% openpyxl approach to achieve what you're looking for :

from openpyxl import load_workbook

wb = load_workbook("workbook_test.xlsx")

for ws in wb:
    ws.unmerge_cells("A1:O1") #unmerge first row till O
    ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
    for num_row in range(6, 282):
        ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
    ws.delete_rows(1, 5) #delete first 5 rows
    
wb.save("workbook_test.xlsx")

NB : This assume there is already an empty column (C).