Home > Net >  Pandas, how to combine multiple sheets that have same names but with either lower or upper case?
Pandas, how to combine multiple sheets that have same names but with either lower or upper case?

Time:11-19

Hello guys I created the following code to get some information out of multiple excel files, I have one for each month of each year (2015 to date) and it seems that somewhere in those time lines they decided to change from upper to lower case on the sheet names, is there a way to call the sheets names with an upper of lower function to normalize their names? Below is what I created so far:

## 2020
import glob as glob 
import pandas as pd
import datetime
pd.set_option('display.max_rows', None)

data2020 = glob.glob('*20*')
cols = ["VendorName", "VendorNo", "InvoiceNumber", "IvaPagadoOriginal","PeriodoEdo"]
li3 = []
for filename in data2020:
    df = pd.read_excel(filename, sheet_name = 'virtual', index_col=None, usecols= cols)
    li3.append(df)
frame3 = pd.concat(li3, axis=0, ignore_index=True)
frame3 = frame3.iloc[:-1 , :]
frame3 = frame3[(frame3.IvaPagadoOriginal != 0)]
frame3 = frame3.dropna(subset=['InvoiceNumber'])                
frame3['Vat Returns Adjustment Category'] = 'Total foreign suppliers' 
frame3['Date'] = pd.to_datetime(frame3['PeriodoEdo'])
frame3['Period Name'] = frame3['Date'].dt.strftime('%b-%y')

As you can see the sheet is called "virtual" but it can be "Virtual" or in some cases "virtuaL"

Any assistance will be much appreciate.

CodePudding user response:

Create a pandas.ExcelFile object which contains a list of the sheet names as an attribute. Then iterate over all the sheet names of each file and select only the one named 'virtual' or 'Virtual' (i.e. if sheet_name.lower() == 'virtual'). Finally, pass only that one to pandas.read_excel.

Replace the for loop with

for filename in data2020:
    excel_file = pd.ExcelFile(filename)
    sheet_name = next(sheet for sheet in excel_file.sheet_names 
                      if sheet.lower() == 'virtual')
    df = pd.read_excel(filename, sheet_name=sheet_name, index_col=None, usecols= cols)
    li3.append(df)

Or use the pandas.ExcelFile.parse method instead of pandas.read_excel

for filename in data2020:
    excel_file = pd.ExcelFile(filename)
    sheet_name = next(sheet for sheet in excel_file.sheet_names 
                      if sheet.lower() == 'virtual')
    df = excel_file.parse(sheet_name=sheet_name, index_col=None, usecols= cols)
    li3.append(df)
  • Related