Hello guys I created the following code to get some information out of multiple excel files, I have one for each month of each year (2015 to date) and it seems that somewhere in those time lines they decided to change from upper to lower case on the sheet names, is there a way to call the sheets names with an upper of lower function to normalize their names? Below is what I created so far:
## 2020
import glob as glob
import pandas as pd
import datetime
pd.set_option('display.max_rows', None)
data2020 = glob.glob('*20*')
cols = ["VendorName", "VendorNo", "InvoiceNumber", "IvaPagadoOriginal","PeriodoEdo"]
li3 = []
for filename in data2020:
df = pd.read_excel(filename, sheet_name = 'virtual', index_col=None, usecols= cols)
li3.append(df)
frame3 = pd.concat(li3, axis=0, ignore_index=True)
frame3 = frame3.iloc[:-1 , :]
frame3 = frame3[(frame3.IvaPagadoOriginal != 0)]
frame3 = frame3.dropna(subset=['InvoiceNumber'])
frame3['Vat Returns Adjustment Category'] = 'Total foreign suppliers'
frame3['Date'] = pd.to_datetime(frame3['PeriodoEdo'])
frame3['Period Name'] = frame3['Date'].dt.strftime('%b-%y')
As you can see the sheet is called "virtual" but it can be "Virtual" or in some cases "virtuaL"
Any assistance will be much appreciate.
CodePudding user response:
Create a pandas.ExcelFile
object which contains a list of the sheet names as an attribute. Then iterate over all the sheet names of each file and select only the one named 'virtual' or 'Virtual' (i.e. if sheet_name.lower() == 'virtual'
). Finally, pass only that one to pandas.read_excel
.
Replace the for loop with
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = pd.read_excel(filename, sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)
Or use the pandas.ExcelFile.parse
method instead of pandas.read_excel
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = excel_file.parse(sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)