Home > Back-end >  How do I update column names for all sheets within a file in Python?
How do I update column names for all sheets within a file in Python?

Time:05-04

I am trying to understand how i can add to my current script where im able to make changes at sheet level. I want to update names of columns and map them to new names. For example, if a column is called 'locationname' i want to update the column name for all sheets within the file as 'company name'

My code so far where I have managed to delete a specific sheet from a file-

import pandas as pd
from openpyxl import load_workbook

for file in os.listdir("C:/Users/yam/Desktop/hhh/python/sterls"):
    if file.startswith("TVC"):
        wb = load_workbook(file)
if 'Opt-Ins' in wb.sheetnames:
    wb.remove(wb['Opt-Ins'])
wb.save(file)

I want to apply the below to all sheets within the file for my script above-

 column_name_update_map = {'Company Name': 'LocationName', 'Address': 'StreetAddress','City':'City','State':'State','Zip':'Zip','Phone Number':'GeneralPhone','Email':'GeneralEmail'}
 df = df.rename(columns=column_name_update_map)

How do I implement this within my loop and save to file?

CodePudding user response:

Loop through each header cell, and attempt to assign a new value to each cell, pulling the value from column_name_update_map. If the current header name (header.value) is not in column_name_update_map, a KeyError will be raised that you can safely ignore. In that case, the value is not updated.

for file in os.listdir("C:/Users/yam/Desktop/hhh/python/sterls"):
    if file.startswith("TVC"):
        wb = load_workbook(file)
        if 'Opt-Ins' in wb.sheetnames:
            wb.remove(wb['Opt-Ins'])
        
        for ws in wb:
            for header in next(ws.rows):
                try:
                    header.value = column_name_update_map[header.value]
                except KeyError:
                    pass
        
        wb.save(file   " (updated headers).xlsx")

CodePudding user response:

you can use something like below once workbook is loaded to get sheets from the workbook and update it accordingly in loop

from openpyxl.utils.dataframe import dataframe_to_rows

sheet_names =  wb.get_sheet_names()
for item in sheet_names:
    current_sheet = wb.get_sheet_by_name(item)
    data = current_sheet.values
    columns = next(data)[0:] // gives you a first row as header
    df = pd.DataFrame(data, columns=columns)
    df.rename(columns=column_name_update_map)
    for r in dataframe_to_rows(df, index=True, header=True):
       ws.append(r)

and then save it. Hope it will help

Alternatively you can just get the first row if sheet and update it using this line columns = next(data)[0:]

  • Related