I would like to specify the columns read when reading multiple sheets of a .xlsx.
I continue to get Usecols do not match columns, columns expected but not found:
I have tried something similar to the following format along with what I included far below:
usecols=['A,D:G', 'B,F,I', 'G:J, K']
The ideal outcome would be reading 3 sheets in, and only reading the columns of each sheet's corresponding list.
import pandas as pd
sheet1_cols = ['orders', 'sales', 'calls']
sheet2_cols = ['clicks', 'impressions', 'convos']
sheet3_cols = ['turnover', 'retention', 'returns']
bulk = pd.read_excel(
'file123.xlsx',
engine='openpyxl',
sheet_name=[1,2,3],
usecols= [sheet1_cols, sheet2_cols, sheet3_cols]
)
CodePudding user response:
You can use a loop as suggested by @ScottBoston.
What I don't understand is how to do this without calling 'pd.read_excel' multiple times.
In this version, I use pd.ExcelFile
to avoid to reopen 3 times the file.
sheets = {
'1': {'usecols': ['orders', 'sales', 'calls']},
'2': {'usecols': ['clicks', 'impressions', 'convos']},
'3': {'usecols': ['turnover', 'retention', 'returns']},
}
def read_excel_sheets(filename, sheets):
dfs = {}
with pd.ExcelFile(filename) as xlsx:
for sheet_name, params in sheets.items():
dfs[sheet_name] = xlsx.parse(sheet_name=sheet_name, **params)
return dfs
# The output is the same as pd.read_excel(filename, sheets=None)
dfs = read_excel_sheets('data.xlsx', sheets)
dfs
is a dict where the key is the sheet name. You can pass any valid parameters of read_excel
to sheets
dictionary.