Home > Blockchain >  Read Specific Columns From Each Sheet of .xlsx File
Read Specific Columns From Each Sheet of .xlsx File

Time:03-26

I would like to specify the columns read when reading multiple sheets of a .xlsx.

I continue to get Usecols do not match columns, columns expected but not found:

I have tried something similar to the following format along with what I included far below:

usecols=['A,D:G', 'B,F,I', 'G:J, K']

The ideal outcome would be reading 3 sheets in, and only reading the columns of each sheet's corresponding list.

import pandas as pd

sheet1_cols = ['orders', 'sales', 'calls']
sheet2_cols = ['clicks', 'impressions', 'convos']
sheet3_cols = ['turnover', 'retention', 'returns']

bulk = pd.read_excel(
    'file123.xlsx',
    engine='openpyxl',
    sheet_name=[1,2,3],
    usecols= [sheet1_cols, sheet2_cols, sheet3_cols]
)

CodePudding user response:

You can use a loop as suggested by @ScottBoston.

What I don't understand is how to do this without calling 'pd.read_excel' multiple times.

In this version, I use pd.ExcelFile to avoid to reopen 3 times the file.

sheets = {
    '1': {'usecols': ['orders', 'sales', 'calls']},
    '2': {'usecols': ['clicks', 'impressions', 'convos']},
    '3': {'usecols': ['turnover', 'retention', 'returns']},
}

def read_excel_sheets(filename, sheets):
    dfs = {}
    with pd.ExcelFile(filename) as xlsx:
        for sheet_name, params in sheets.items():
            dfs[sheet_name] = xlsx.parse(sheet_name=sheet_name, **params)
    return dfs

# The output is the same as pd.read_excel(filename, sheets=None)
dfs = read_excel_sheets('data.xlsx', sheets)

dfs is a dict where the key is the sheet name. You can pass any valid parameters of read_excel to sheets dictionary.

  • Related